Microarray for detecting viable organisms

ABSTRACT

A methodology of microarray using the fluorescent DNA intercalating agent propidium monoazide (PMA) to selectively block DNA of dead cells from amplification and its application in detecting and enumerating viable microbes in complex microbial communities is described. A phylogenetic array is used in the preferred embodiment to enhance the sensitivity of the method. The PMA-Microarray assay is particularly applicable for monitoring samples from environments with extremely low microbial burden such as spacecraft surfaces.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. §119(e) to U.S.Provisional Application No. 61/441,820 filed Feb. 11, 2011, the contentof which is herein expressly incorporated by references in theirentireties.

This application is related to U.S. Application Ser. No. 61/259,565[Attorney Docket No. IB-2733P1], filed on Nov. 9, 2009; U.S. ApplicationSer. No. 61/317,644 [Attorney Docket No. IB-2733P2], filed on Mar. 25,2010; U.S. Application Ser. No. 61/347,817 [Attorney Docket No.IB-2733P3], filed on May 24, 2010 and co-pending U.S. patent applicationSer. No. 12/474,204, filed on May 28, 2009, which are herebyincorporated by reference in their entirety.

This application is related to the co-pending international applicationhaving application number PCT/US2010/040106 [Attorney Docket No.IB-2733PCT], filed on Jun. 25, 2010, which is incorporated herein byreference.

This application is also related to the co-pending U.S. patentapplication Ser. No. 13/152,213, filed on Jun. 2, 2011, which is herebyincorporated by reference in its entirety.

STATEMENT OF GOVERNMENTAL SUPPORT

The invention claimed herein was made in the performance of work underDOE and NASA contracts, and is subject to the provisions of Public Law96-517 (35 USC 202) in which the Contractor has elected to retain title.This invention was made in part under Contract DE-AC02-05CH11231 awardedby the Department of Energy and under Contract Nos. NNH09ZDA001N andNAS7-1407 awarded by NASA. The government has certain rights in thisinvention.

BACKGROUND

There is a continuing need for the development of a sensitive, rapidmethod that will detect and enumerate total viable microorganisms.Culture-based plating methods, including the NASA standard assay, haveremained the “gold standard,” even though such methods are bothlaborious and time consuming. Nucleic-acid-based amplification such asthe polymerase chain reaction (PCR), particularly real-timequantitative-PCR (qPCR), provides a promising approach that allows rapiddetection in a wide range of applications. Nonetheless, PCR-basedmethods are rarely used for process validation, because such methodscannot discriminate between live and inactive microorganisms due to thehigh stability of the DNA from dead bacterial cells. Detecting thenumber or kinds of viable microorganisms is useful in a variety ofapplications.

SUMMARY OF THE INVENTION

The present application provides a method of detecting and enumeratinglow levels of microorganisms. The method combines propidium monoazide(PMA) assay with PhyloChip microarray analysis, allowing rapid,selective detection and enumeration of viable microbes in microbialcommunities containing both live and dead microorganisms.

In one aspect, the present application provides a method for detectinglive cells in a sample. In some embodiments, the method comprisesselectively amplifying a nucleic acid from live cells from a samplecomprising live and dead cells; and detecting the presence, absence,relative abundance, and/or quantity of one or more operational taxonunits (OTUs) in the sample based on hybridization of amplified nucleicacid to a plurality of probes complementary to 16s rRNA sequences,wherein said one or more OTUs consist essentially of viable organismsfrom said sample. In some embodiments, the nucleic acid is DNA. In someembodiments, selectively amplifying a nucleic acid comprisespre-treating the sample with an agent that selectively modifies anucleic acid of dead cells. In some embodiments, the agent thatselectively modifies a nucleic acid of dead cells is a DNA intercalatingagent, such as propidium monoazide. In some embodiments, the probes areused to detect the presence, absence, relative abundance, and/orquantity of at least 10,000 different OTUs in a single assay. In someembodiments, the presence, absence, relative abundance, and/or quantityis detected with a confidence level greater than 95%. In someembodiments, the method further comprises quantifying the number of livecells in the sample.

In one aspect, the present application provides a method for detectingviable organisms in a sample. In some embodiments, the method comprisesselectively amplifying a nucleic acid from live cells from a samplecomprising live and dead cells; and determining the presence, absence,relative abundance, and/or quantity of at least 1,000 different OTUs ina single assay, wherein said OTUs consist essentially of viableorganisms from said sample. In some embodiments, selectively amplifyingcomprises pre-treating the sample with an agent that selectivelymodifies a nucleic acid of dead cells. In some embodiments, the agentthat selectively modifies a nucleic acid of dead cells is a DNAintercalating agent, such as propidium monoazide. In some embodiments,contacting a sample with an agent that selectively modifies a nucleicacid of dead cells in the sample is followed by exposure to visiblelight. In some embodiments, the presence, absence, relative abundance,and/or quantity is detected with a confidence level greater than 95%. Insome embodiments, the method further comprises quantifying the number oflive cells in the sample.

In one aspect, the present application provides a method for detectingand optionally quantifying viable organisms in a sample. In someembodiments, the method comprises (a) selectively amplifying a nucleicacid from live cells from a sample comprising live and dead cells; (b)hybridizing amplified nucleic acid to a plurality of probes; (c)determining hybridization signal strength distributions for a pluralityof different interrogation probes, each of which is complementary to asection within one or more highly conserved polynucleotides in one ormore target OTUs; (d) determining hybridization signal strengthdistributions for a plurality of mismatch probes, wherein for eachinterrogation probe, one or more different corresponding mismatch probescomprising one or more nucleotide mismatches with said section withinsaid one or more highly conserved polynucleotides are included in theplurality of mismatch probes; and, (e) using the hybridization signalstrengths of the interrogation probes and mismatch probes to determinethe probability that the hybridization signal for the differentinterrogation probes represents the presence, absence, relativeabundance, and/or quantity of said one or more OTUs, wherein said one ormore OTUs consist essentially of viable organisms from said sample. Insome embodiments, selectively amplifying comprises pre-treating thesample with an agent that selectively modifies a nucleic acid of deadcells. In some embodiments, the agent that selectively modifies anucleic acid of dead cells is a DNA intercalating agent, such aspropidium monoazide. In some embodiments, contacting a sample with anagent that selectively modifies a nucleic acid of dead cells in thesample is followed by exposure to visible light. In some embodiments,the highly conserved polynucleotides are selected from the groupconsisting of 16S rRNA gene, 23S rRNA gene, 5S rRNA gene, 5.8S rRNAgene, 12S rRNA gene, 18S rRNA gene, 28S rRNA gene, gyrB gene, rpoB gene,fusA gene, recA gene, coxl gene, nif13 gene, RNA molecules derivedtherefrom, and a combination thereof. In some embodiments, eachinterrogation probe has 4 or more corresponding mismatch probes in theplurality of mismatch probes. In some embodiments, the probes are usedto detect the presence, absence, relative abundance, and/or quantity ofat least 10,000 different OTUs in a single assay. In some embodiments,the probes are attached to a substrate, such as a bead or a microsphere.In some embodiments, the probes are attached to a substrate comprisingglass, plastic, or silicon. In some embodiments, the presence, absence,relative abundance, and/or quantity is detected with a confidence levelgreater than 95%.

In one aspect, the present application provides a kit for detecting andoptionally quantifying live cells in a sample. In some embodiments, thekit comprises (a) an agent that selectively modifies a nucleic acid ofdead cells; (b) a plurality of different interrogation probes, each ofwhich is complementary to a section within one or more highly conservedpolynucleotides in one or more target operational taxon units (OTUs);and, (c) a plurality of mismatch probes, wherein for each interrogationprobe, one or more different corresponding mismatch probes comprisingone or more nucleotide mismatches with said section within said one ormore highly conserved polynucleotides are included in the plurality ofmismatch probes. In some embodiments, the agent that selectivelymodifies a nucleic acid of dead cells is a DNA intercalating agent, suchas propidium monoazide. In some embodiments, each interrogation probehas 4 or more corresponding mismatch probes in the plurality of mismatchprobes. In some embodiments, the highly conserved polynucleotides areselected from the group consisting of 16S rRNA gene, 23S rRNA gene, 5SrRNA gene, 5.8S rRNA gene, 12S rRNA gene, 18S rRNA gene, 28S rRNA gene,gyrB gene, rpoB gene, fusA gene, recA gene, coxl gene, nif13 gene, RNAmolecules derived therefrom, and a combination thereof.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1 shows a schematic of the rapid measurement of bacterial sporesusing PMA system by fluorescence method.

FIG. 2 shows a comprehensive flow chart depicting a PMA-Microarrayexperimental process.

FIG. 3 shows a graph of real-time PCR using an E. coli DNA controlbefore and after treatment with PMA.

FIG. 4 shows a graph of B. pumilus SAFR-032 spores with differenttreatments. Gene target for qPCR is gyrB.

FIG. 5 shows a graph of heat resistance of B. pumilus spores at 90° C.The D-values for plate and qPCR are 5 and 33 min, respectively.

FIG. 6 shows a graph of UV inactivation curve of B. pumilus spores.Curves are based on the averages of all results in triplicate.

FIG. 7 illustrates an example of a suitable computer system environment.

DETAILED DESCRIPTION Definitions

As used herein, the term “oligonucleotide” refers to a polynucleotide,usually single stranded, that is either a synthetic polynucleotide or anaturally occurring polynucleotide. The length of an oligonucleotide isgenerally governed by the particular role thereof, such as, for example,probe, primer and the like. Various techniques can be employed forpreparing an oligonucleotide, for instance, biological synthesis orchemical synthesis. A nucleic acid disclosed herein will generallycontain phosphodiester bonds, although in some cases, as outlined below,nucleic acid analogs are included that may have alternate backbones,comprising, for example, phosphoramide (Beaucage, et al., Tetrahedron,49(10):1925 (1993) and references therein; Letsinger, J. Org. Chem.,35:3800 (1970); Sprinzl, et al., Eur. J. Biochem., 81:579 (1977);Letsinger, et al., Nucl. Acids Res., 14:3487 (1986); Sawai, et al.,Chem. Lett., 805 (1984), Letsinger, et al., J. Am. Chem. Soc., 110:4470(1988); and Pauwels, et al., Chemica Scripta, 26:141 (1986));phosphorothioate (Mag, et al, Nucleic Acids Res., 19:1437 (1991); andU.S. Pat. No. 5,644,048); phosphorodithioate (Briu, et al., J. Am. Chem.Soc., 111:2321 (1989)); 0-methylphophoroamidite linkages (see Eckstein,Oligonucleotides and Analogues: A Practical Approach, Oxford UniversityPress); and peptide nucleic acid backbones and linkages (see Egholm, J.Am. Chem. Soc., 114:1895 (1992); Meier, et al., Chem. Int. Ed. Engl.,31:1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson, et al.,Nature, 380:207 (1996), all of which are incorporated by reference)).Other analog nucleic acids include those with positive backbones(Denpcy, et al., Proc. Natl. Acad. Sci. USA, 92:6097 (1995)); non-ionicbackbones (U.S. Pat. Nos. 5,386,023; 5,637,684; 5,602,240; 5,216,141;and 4,469,863; Kiedrowshi, et al., Angew. Chem. Intl. Ed. English,30:423 (1991); Letsinger, et al., J. Am. Chem. Soc., 110:4470 (1988);Letsinger, et al., Nucleosides & Nucleotides, 13:1597 (1994); Chapters 2and 3, ASC Symposium Series 580, “Carbohydrate Modifications inAntisense Research”, Ed. Y. S. Sanghui and P. Dan Cook; Mesmaeker, etal., Bioorganic & Medicinal Chem. Lett., 4:395 (1994); Jeffs, et al., J.Biomolecular NMR, 34:17 (1994); Tetrahedron Lett., 37:743 (1996)); andnon-ribose backbones, including those described in U.S. Pat. Nos.5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580,“Carbohydrate Modifications in Antisense Research”, Ed. Y. S. Sanghuiand P. Dan Cook. Nucleic acids containing one or more carbocyclic sugarsare also included within the definition of nucleic acids (see Jenkins,et al., Chem. Soc. Rev., (1995) pp. 169-176). Several nucleic acidanalogs are described in Rawls, C & E News, Jun. 2, 1997, page 35. Allof these references are hereby expressly incorporated by reference.

The nucleic acid may be DNA, RNA, or a hybrid and may contain anycombination of deoxyribo- and ribo-nucleotides, and any combination ofbases, including uracil, adenine, thymine, cytosine, guanine, inosine,xanthanine, hypoxanthanine, isocytosine, isoguanine, and base analogssuch as nitropyrrole and nitroindole, etc. Oligonucleotides can besynthesized by standard methods such as those used in commercialautomated nucleic acid synthesizers and later attached to an array, beador other suitable surface. Alternatively, the oligonucleotides can besynthesized directly on the assay surface using photolithographic orother techniques. In some embodiments, linkers are used to attach theoligonucleotides to an array surface or to beads.

As used herein, the terms “nucleic acid,” “nucleic acid molecule,” and“polynucleotide” refer to a compound or composition that is a polymericnucleotide or nucleic acid polymer. The nucleic acid molecule may be anatural compound or a synthetic compound. The nucleic acid molecule canhave from about 2 to 5,000,000 or more nucleotides. The larger nucleicacid molecules are generally found in the natural state. In an isolatedstate, the nucleic acid molecule can have about 10 to 50,000 or morenucleotides, usually about 100 to 20,000 nucleotides. It is thus obviousthat isolation of a nucleic acid molecule from the natural state oftenresults in fragmentation. It may be useful to fragment longer targetnucleic acid molecules, particularly RNA, prior to hybridization toreduce competing intramolecular structures. Fragmentation can beachieved chemically, enzymatically, or mechanically. Typically, when thesample contains DNA, a nuclease such as deoxyribonuclease (DNase) isemployed to cleave the phosphodiester linkages. Nucleic acid molecules,and fragments thereof, include, but are not limited to, purified orunpurified forms of DNA (dsDNA and ssDNA) and RNA, including tRNA, mRNA,rRNA, mitochondrial DNA and RNA, chloroplast DNA and RNA, DNA/RNAhybrids, biological material or mixtures thereof, genes, chromosomes,plasmids, cosmids, the genomes of microorganisms, e.g., bacteria,yeasts, phage, chromosomes, viruses, viroids, molds, fungi, or otherhigher organisms such as plants, fish, birds, animals, humans, and thelike. The polynucleotide can be only a minor fraction of a complexmixture such as a biological sample.

As used herein, the term “hybridize” refers to the process by whichsingle strands of polynucleotides form a double-stranded structurethrough hydrogen bonding between the constituent bases. The ability oftwo polynucleotides to hybridize with each other is based on the degreeof complementarity of the two polynucleotides, which in turn is based onthe fraction of matched complementary nucleotide pairs. The morenucleotides in a given polynucleotide that are complementary to anotherpolynucleotide, the more stringent the conditions can be forhybridization and the more specific will be the binding between the twopolynucleotides. Increased stringency may be achieved by elevating thetemperature, increasing the ratio of co-solvents, lowering the saltconcentration, and combinations thereof.

As used herein, the terms “complementary,” “complement,” and“complementary nucleic acid sequence” refer to the nucleic acid strandthat is related to the base sequence in another nucleic acid strand bythe Watson-Crick base-pairing rules. In general, two polynucleotides arecomplementary when one polynucleotide can bind another polynucleotide inan anti-parallel sense wherein the 3′-end of each polynucleotide bindsto the 5′-end of the other polynucleotide and each A, T(U), G, and C ofone polynucleotide is then aligned with a T(U), A, C, and G,respectively, of the other polynucleotide. Polynucleotides that compriseRNA bases can also include complementary G/U or U/G basepairs. Twocomplementary strands may comprise complementary regions comprising allor one or more portions of one or both strands.

As used herein, the term “clustering tree” refers to a hierarchical treestructure in which observations, such as organisms, genes, andpolynucleotides, are separated into one or more clusters. The root nodeof a clustering tree consists of a single cluster containing allobservations, and the leaf nodes correspond to individual observations.A clustering tree can be constructed on the basis of a variety ofcharacteristics of the observations, such as sequences of the genes andmorphological traits of the organisms. Many techniques known in the art,e.g. hierarchical clustering analysis, can be used to construct aclustering tree. A non-limiting example of the clustering tree is aphylogenetic, taxonomic or evolutionary tree.

As used herein, the terms “operational taxon unit,” “OTU,” “taxon,”“hierarchical cluster,” and “cluster” are used interchangeably. Anoperational taxon unit (OTU) refers to a group of one or more organismsthat comprises a node in a clustering tree. The level of a cluster isdetermined by its hierarchical order. In some embodiments, an OTU is agroup tentatively assumed to be a valid taxon for purposes ofphylogenetic analysis. In some embodiments, an OTU is any of the extanttaxonomic units under study. In yet other embodiments, an OTU is given aname and a rank. For example, an OTU can represent a domain, asub-domain, a kingdom, a sub-kingdom, a phylum, a sub-phylum, a class, asub-lass, an order, a sub-order, a family, a subfamily, a genus, asubgenus, or a species. In some embodiments, OTUs can represent one ormore organisms from the kingdoms eubacteria, protista, or fungi at anylevel of a hierarchal order. In some embodiments, an OTU represents aprokaryotic or fungal order.

As used herein, the term “kmer” refers to a polynucleotide of length k.In some embodiments, k is an integer from 1 to 1000. In someembodiments, k is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60,65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 250, 300, 400, 500,600, 700, 800, 900, or 1000.

As used herein, the term “perfect match probe” (PM probe) refers to akmer which is 100% complementary to at least a portion of a highlyconserved target gene or polynucleotide. The perfect complementarityusually exists throughout the length of the probe. Perfect probes,however, may have a segment or segments of perfect complementarity thatis/are flanked by leading or trailing sequences lacking complementarityto the target gene or polynucleotide.

As used herein, the term “mismatch probe” (MM probe) refers a controlprobe that is identical to a corresponding PM probe at all positionsexcept for one, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides of the PMprobe. Typically, the non-identical position or positions are located ator near the center of the PM probe. In some embodiments, the mismatchprobes are universal mismatch probes, e.g., a collection of mismatchprobes that have no more than a set number of nucleotide variations orsubstitutions compared to positive probes. For example, the universalmismatch probes may differ in nucleotide sequence by no more than fivenucleotides compared to any one PM probe in the PM probe set. In someembodiments, a MM probe is used adjacent to each test probe, e.g., a PMprobe targeting a bacterial 16S rRNA sequence, in the array.

As used herein, the term “probe pair” refers to a PM probe and acorresponding MM probe. In some embodiments, the PM probes and the MMprobes are scored in relation to each other during data processing andstatistic analysis. As used herein, the term “a probe pair associatedwith an OTU” is defined as a pair of probes consisting of anOTU-specific PM probe and its corresponding MM probe.

As used herein, a “sample” is from any source, including, but notlimited to a biological sample, a gas sample, a fluid sample, a solidsample, or any mixture thereof.

As used herein, a “microorganism” or “organism” includes, but is notlimited to, a virus, viroids, bacteria, archaea, fungi, protozoa and thelike.

The term “sensitivity” refers to a measure of the proportion of actualpositives which are correctly identified as such.

The term “specificity” refers to a measure of the proportion of actualnegatives which are correctly identified as such.

The term “confidence level” refers to the likelihood, expressed as apercentage, that the results of a test are real and repeatable, and notrandom. Confidence levels are used to indicate the reliability of anestimate and can be calculated by a variety of methods.

Live Cell Detection

In some embodiments, the methods comprise selectively amplifying anucleic acid from live cells from a sample comprising live and deadcells. In some embodiments, the nucleic acid is DNA and/or RNA. In someembodiments, selectively amplifying comprises pre-treating the samplewith an agent that selectively modifies a nucleic acid of dead cells.Selective modification can be the result of differences between theaccessibility of DNA in live cells and the DNA in and/or derived fromdead cells to the selective agent, such as an agent that is a cellmembrane- and/or cell wall-impermeant agent. In some embodiments, theDNA of dead cells is more accessible the selective agent than the DNA oflive cells. Selective modification can comprise intercalation in thenucleic acid. Exposure to an agent that selectively modifies a nucleicacid of dead cells may be followed by exposure to a second agent thatfurther modifies the nucleic acid modified by the selective agent. Insome embodiments, the second agent is visible light. In someembodiments, modified nucleic acid is selectively removed prior toamplification. In some embodiments, modified nucleic acid is presentduring amplification but is not efficiently amplified. For example,unmodified DNA can be amplified at least about 5-fold, 10-fold, 25-fold,50-fold, 100-fold, 200-fold, 300-fold, 400-fold, 500-fold, 600-fold,700-fold, 800-fold, 900-fold, 1000-fold, 2000-fold, 5000-fold,10000-fold, 20000-fold, 50000-fold, 100000-fold, or more than modifiedDNA. In some embodiments, the agent that selectively modifies a nucleicacid of dead cells is a DNA intercalating agent. Examples of DNAintercalating agents include, but are not limited to ethidium compounds,such as ethidium monoazide, ethidium bromide, and ethidium diazide;propidium compounds, such as propidium monozaide and propidium iodide;and 7-Amino-actinomycin D.

DNA intercalating agents such as ethidium monoazide (EMA) and propidiummonoazide (PMA) can be used to selectively distinguish between viableand dead bacterial cells, because they selectively penetrate themembranes of dead cells. Once penetrated, they intercalate the DNA and,upon photolysis using visible light, produce stable DNA monoadducts,which are not efficiently amplified by PCR, either through removalduring or subsequent to the DNA extraction process, or by serving aspoor templates in the amplification process. Therefore, once PMA is usedto treat bacterial cells prior to DNA extraction, only the DNA of viablecells will be available for PCR amplification. In some embodiments,excess amounts of the selective agent are removed before nucleic acidextraction. Following selective amplification of nucleic acid from livecells, amplified nucleic acid can be subjected to a detection process,such as quantitative PCR or a probe-based analysis. In some embodiments,amplified nucleic acids are hybridized with a plurality of probes.

In some embodiments, detection of amplified nucleic acid comprisesdetection of rRNA or DNA encoding rRNA, such as 16S rRNA. Molecularmethods based on detecting and analyzing 16S rRNA have been developed toenable the identification of specific microorganism in a mixed bacterialpopulation. Among such methods, DNA microarray analysis is significantlymore sensitive and robust than traditional rRNA gene sequencing methods.A 16S rRNA-based phylogenetic microarray (“PhyloChip”) allows detailedmeasurements of microbial community composition in a high-throughput andreproducible manner. Phylogenetic microarrays and methods of making andusing the same are described in U.S. Application Ser. No. 61/259,565[Attorney Docket No. IB-2733P1], filed on Nov. 9, 2009; U.S. ApplicationSer. No. 61/317,644 [Attorney Docket No. IB-2733P2], filed on Mar. 25,2010; U.S. Application Ser. No. 61/347,817 [Attorney Docket No.IB-2733P3], filed on May 24, 2010; and co-pending U.S. patentapplication Ser. No. 12/474,204, filed on May 28, 2009, and co-pendinginternational application having application number PCT/US2010/040106[Attorney Docket No. IB-2733PCT], filed on Jun. 25, 2010, all of whichare hereby incorporated by reference in their entirety for all purposes.

Features of phylogenetic microarrays disclosed herein include the use ofmultiple oligonucleotide probes for every known category of prokaryoticorganisms for high-confidence detection, and the pairing of at least onemismatch probe for every perfectly matched probe to minimize the effectof nonspecific hybridization. In some embodiments, each perfect matchprobe corresponds to 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, or more mismatch probes. These and other features,alone or in combination as described herein, make arrays of the methodsand systems disclosed herein extremely sensitive, allowingidentification of very low levels of microorganisms.

Biosignatures

In one aspect, the present application utilizes a biosignature of OTUs.As used herein, the term “biosignature” refers to an association of thelevel of one or more members of one or more OTUs with a particularcondition or state, such as a classification, diagnosis, prognosis,and/or predicted outcome of a disease condition in a subject. In someembodiments, the biosignature comprises a determination of the presence,absence, relative abundance, and/or quantity of at least 1, 2, 3, 4, 5,10, 20, 50, 100, 250, 500, 1000, 5000, 10,000, 20,000, 30,000, 40,000,50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 250,000, 500,000 or1,000,000 OTUs in a sample using a single assay. In some embodiments,the biosignature comprises the presence of or changes in the level of atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 75, 100, 125,150, 175, 200, 250, 300, or more OTUs. In some embodiments, the OTUsconsist essentially of viable organisms from a sample.

In some embodiments, the biosignature is associated with a singlecondition. In some embodiments, the biosignature is associated with acombination of conditions. A biosignature can be obtained for anysample, including but not limited to: tissue samples; cell culturesamples; bacterial culture samples; samples obtained from a subject,including biopsies, body fluids and other excreted material; pulmonarysamples; environmental samples; other samples as described herein;materials derived therefrom; and combinations thereof. In someembodiments, a biosignature of a test sample is compared to a knownbiosignature, and a determination is made as to likelihood that thebiosignatures are the same. The biosignature to which the biosignatureof the test sample is compared can be determined before, after, or atsubstantially the same time as that of the test sample. Biosignaturescan be the result of one or more analyses of one or more samples from aparticular source. In some embodiments, a biosignature is indicative ofa response to treatment. In some embodiments, a biosignature is used asa basis for the selection of a mode of treatment.

In some embodiments, the biosignature of a test sample is a combinationof two or more independent biosignatures, such as 2, 3, 4, 5, 6, 7, 8,9, 10 or more independent biosignatures. In some embodiments, each ofthe two or more biosignatures contained in a sample are assayedsimultaneously. In a further embodiment, a subset of biosignatures canbe evaluated through the use of low-density detection systems,comprising the determination of the presence, absence, relativeabundance, and/or quantity of no more than 10, 25, 50, 100, 250, 500,1000, 2000, or 5000 OTUs.

In some embodiments, a biosignature comprises a measure of the number ofmembers, such as the number of live members, in one or more bacterialfamilies or OTUs. The number of members can range from 0 to 10000 ormore, such as 0 to 5000, 0 to 2500, 0 to 1000, 0 to 2000, 0 to 1000, 0to 900, 0 to 800, 0 to 700, 0 to 600, 0 to 500, 0 to 400, 0 to 300, 0 to200, 0 to 100, 0 to 50, 0 to 25, 0 to 20, 0 to 10, or 0 to 5. In someembodiments, a biosignature comprises the presence of 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70,80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 2500,5000, 10000, or more members of one or more bacterial families or OTUs,or the presence of a range that includes any two of these values as endpoints. In some embodiments, a biosignature comprises a ratio betweennumbers of members in two or more bacterial families or OTUs. Thenumerator and denominator of such ratios can include overlapping sets ofbacterial families or OTUs. Ratios of the numbers of members in two ormore bacterial families can compare a first set of one or more bacterialfamilies or OTUs to a second set of one or more bacterial families orOTUs, where there is at least one bacterial family or OTU differencebetween the first and second set. A set of bacterial families or OTUscan include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or more OTUs.

In one aspect, the present application provides methods, systems, andcompositions for detecting and identifying a plurality of biomoleculesand/or organisms in a sample. The present application utilizes theability to differentiate between individual organisms or OTUs, as wellas, in some embodiments, between live and dead members of OTUs. In oneaspect, the individual organisms or OTUs are identified usingorganism-specific and/or OTU-specific probes, e.g., oligonucleotideprobes. More specifically, some embodiments relate to selectingorganism-specific and/or OTU-specific oligonucleotide probes useful indetecting and identifying biomolecules and organisms in a sample. Insome embodiments, an oligonucleotide probe is selected on the basis ofthe cross-hybridization pattern of the oligonucleotide probe to regionswithin a target oligonucleotide and its homologs in a plurality oforganisms. The homologs can have nucleotide sequences that are at least80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5%identical. Such oligonucleotides can be gene, or intergenetic sequences,in whole or a portion thereof. The oligonucleotides can range from 10 toover 10,000 nucleotides in length. In some other embodiments, a methodis provided for detecting the presence of an OTU in a sample based atleast partly on the cross-hybridization of the OTU-specificoligonucleotide probes to probes specific for other organisms or OTUs.In some embodiments, the biosignature to which a sample biosignature iscompared comprises a positive result for the presence of the targets forone or more probes.

In one aspect, the present application provides a diagnostic system forthe determination or evaluation of a biosignature of a sample. In someembodiments, the diagnostic system comprises at least 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 15, 20, 30, 40, 50, 75, 100, 125, 150, 175, 200, 250, 300,or more probes. In some embodiments, the diagnostic system comprises upto 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 75, 100, 125, 150,175, 200, 250, 300, or more probes.

High Capacity Systems

In one aspect of the present application, a high capacity system isprovided for determining a biosignature of a sample by assessing thetotal microorganism population of a sample in terms of themicroorganisms present and optionally their percent composition of thetotal population. In some embodiments, the microorganisms so determinedconsist essentially of viable organisms from a sample. In someembodiments, the system comprises a plurality of probes that are capableof determining the presence or quantity of at least 1000, 5000, 10000,20000, 30000, 40000, 50000, 60000, or more different OTUs in a singleassay. Typically, the probes selectively hybridize to a highly conservedpolynucleotide. Usually, the probes hybridize to the same highlyconserved polynucleotide or within a portion thereof. Generally, thehighly conserved polynucleotide or fragment thereof comprises a gene orfragment thereof. Non-limiting examples of highly conservedpolynucleotides comprise nucleotide sequences found in the 16S rRNAgene, 23S rRNA gene, 5S rRNA gene, 5.8S rRNA gene, 12S rRNA gene, 18SrRNA gene, 28S rRNA gene, gyrB gene, rpoB gene, fusA gene, recA gene,coxl gene and/or nifD gene. In other embodiments, two or more, three ormore, four or more, five or more, six or more, seven or more, eight ormore, nine or more, ten or more, 15 or more, 20 or more, 25 or more, or50 or more collections of probes are employed, each of whichspecifically hybridizes to a different highly conserved polynucleotideor portion thereof. For example, a first collection of probes binds tothe same region of the 16S rRNA gene, a second collection of probesbinds to the same region of the 16S rRNA gene that is different from theregion bound by probes in the first collection, and a third collectionof probes binds to the same region of the 23S rRNA gene. The use of twoor more collections of probes where each collection recognizes distinctand separate highly conserved polynucleotides or portions thereof allowsfor the generation and testing of more probes the use of which canprovide greater discrimination between species or OTUs.

Highly conserved polynucleotides usually show at least 80%, 85%, 90%,92%, 94%, 95%, or 97% homology across a domain, kingdom, phylum, class,order, family or genus, respectively. The sequences of thesepolynucleotides can be used for determining evolutionary lineage ormaking a phylogenetic determination and are also known as phylogeneticmarkers. In some embodiments, a biosignature comprises the presence,absence, and/or abundance of a combination of phylogenetice markers. TheOTUs detected by the probes disclosed herein can be bacterial, archeal,fungal, or eukaryotic in origin. Additionally, the methodologiesdisclosed herein can be used to quantify OTUs that are bacterial,archaeal, fungal, or eukaryotic. By combining the various probe sets, asystem for the detection of bacteria, archaea, fungi, eukaryotes, orcombinations thereof can be designed. Such a universal microorganismtest that is conducted as a single assay can provide great benefit forassessing and understanding the composition and ecology of numerousenvironments, including characterization of biosignatures for varioussamples, environments, conditions, and contaminants.

In another aspect of the present application, a system is provided thatis capable of determining the probability of presence and optionallyquantity of at least 10000, 20000, 30000, 40000, 50000, 60000, or moredifferent OTUs of a single domain in a single assay. In someembodiments, such a system makes a probability determination with aconfidence level greater than 90%, 91%, 92%, 93%, 94%, 95%, 99% or99.5%. In some embodiments, a biosignature can comprise the combinedresult of each probability determination.

Some embodiments provide a method of selecting an oligonucleotide probethat is specific for a node in a clustering tree. In some embodiments,the method comprises selecting a highly conserved target polynucleotideand its homologs for a plurality of organisms; clustering thepolynucleotides and homologs of the plurality of organisms into aclustering tree; and determining a cross-hybridization pattern of acandidate oligonucleotide probe that hybridizes to a firstpolynucleotide to each node on the clustering tree. This determinationis performed (e.g., in silico) to determine the likelihood that theprobe would cross hybridize with homologs of its target complementarysequence. The candidate oligonucleotide probe can be complementary to ahighly conserved target polynucleotide, a fragment of the highlyconserved target or one of its homologs in one of the plurality oforganisms. In some embodiments, a method is provided for thedetermination of the cross-hybridization pattern of a variant of thecandidate oligonucleotide probe to each node on the clustering tree,wherein the variant corresponds to the candidate oligonucleotide probebut comprises at least 1 nucleotide mismatch; and selecting or rejectingthe candidate oligonucleotide probe on the basis of thecross-hybridization pattern of the candidate oligonucleotide probe andthe cross-hybridization pattern of the variant. In some embodiments, thenode is an operational taxon unit (OTU). In some embodiments, the nodeis a single organism.

Some embodiments provide a method of selecting an OTU-specificoligonucleotide probe for use in detecting a plurality of organisms in asample. In some embodiments, the method comprises: selecting a highlyconserved target polynucleotide and its homologs from the plurality oforganisms; clustering the polynucleotides of the target gene and itshomologs from the plurality of organisms into one or more operationaltaxonomic units (OTUs), wherein each OTU comprises one or more groups ofsimilar nucleotide sequence; determining the cross-hybridization patternof a candidate OTU-specific oligonucleotide probe to the OTUs, whereinthe candidate OTU-specific oligonucleotide probe corresponds to afragment of the target gene or its homolog from one of the plurality oforganisms; determining the cross-hybridization pattern of a variant ofthe candidate OTU-specific oligonucleotide probe to the OTUs, whereinthe variant comprises at least 1 nucleotide mismatch from the candidateOTU-specific oligonucleotide probe; and selecting or rejecting thecandidate OTU-specific oligonucleotide probe on the basis of thecross-hybridization pattern of the candidate OTU-specificoligonucleotide probe and the cross-hybridization pattern of thevariant. In some embodiments, the candidate OTU-specific oligonucleotideprobe is selected if the candidate OTU-specific oligonucleotide probedoes not cross-hybridize with any polynucleotide that is complementaryto probes from other OTUs. In further embodiments, the candidateOTU-specific oligonucleotide probe is selected if the candidateOTU-specific oligonucleotide probe cross-hybridizes with thepolynucleotide in no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 20, 30, 40, 50, 100, 200, 500, or 1000 other OTU groups.

Some embodiments provide a method of selecting a set oforganism-specific oligonucleotide probes for use in detecting aplurality of organisms in a sample. In some embodiments, the methodcomprises: identifying a highly conserved target polynucleotide and itshomologs in the plurality of organisms; determining thecross-hybridization pattern of a candidate organism-specificoligonucleotide probe to the sequences of the highly conserved targetpolynucleotide and its homologs in the plurality of organisms, whereinthe candidate oligonucleotide probe corresponds to a fragment of thetarget sequence or its homolog from one of the plurality of organisms;determining the cross-hybridization pattern of a variant of thecandidate organism-specific oligonucleotide probe to the sequences ofthe highly conserved target sequence and its homologs in the pluralityof organisms, wherein the variant comprises at least 1 nucleotidemismatch from the candidate organism-specific oligonucleotide probe; andselecting or rejecting the candidate organism-specific oligonucleotideprobe on the basis of the cross-hybridization pattern of the candidateorganism-specific oligonucleotide probe and the cross-hybridizationpattern of the variant of the candidate organism-specificoligonucleotide probe.

In some embodiments, an OTU-specific oligonucleotide probe does notcross-hybridize with any polynucleotide that is complementary to probesfrom other OTUs. In other embodiments, an OTU-specific oligonucleotideprobe cross-hybridizes with the polynucleotide in no more than 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 100, 200, 500,or 1000 other OTU groups. Some embodiments utilize a set oforganism-specific oligonucleotide probes for use in detecting aplurality of organisms in a sample. In further embodiments, thecandidate organism-specific oligonucleotide probe is selected if thecandidate organism-specific oligonucleotide probe only hybridizes withthe target nucleic acid molecule of no more than 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50 unique organisms in theplurality of organisms. In other embodiments, the process is iterativewith multiple candidate specific-specific oligonucleotide probesselected. Frequently, the selected organism-specific oligonucleotideprobes are clustered and aligned into groups of similar sequences thatallow for the detection of an organism with high confidence based on nomore than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, or 60organism-specific oligonucleotide probe matches per OTU. Generally, thecandidate organism that the organism-specific oligonucleotide probesdetect corresponds to a leaf or node of at least one phylogenetic,genealogic, evolutionary, or taxonomic tree. Knowledge of the positionthat a candidate organism detected by the organism-specificoligonucleotide probe occupies on a tree provides relational informationof the organism to other members of its domain, phylum, class, subclass,order, family, subfamily, or genus.

In some embodiments, the method disclosed herein selects and/or utilizesa set of organism-specific oligonucleotide probes that are ahierarchical set of oligonucleotide probes that can be used to detectand differentiate a plurality of organisms. In some embodiments, themethod selects and/or utilizes organism-specific or OTU-specificoligonucleotide probes that allow a comprehensive screen for at least80%, 85%, 90%, 95%, 99% or 100% of all known bacterial or archaeal taxain a single analysis, and thus provides an enhanced detection ofdifferent desired taxonomic groups. In some embodiments, the identity ofall known bacterial or archaeal taxa comprises taxa that were previouslyidentified by the use of oligonucleotide specific probes, PCR cloning,and sequencing methods. Some embodiments provide methods of selectingand/or utilizing a set of oligonucleotide probes capable of correctlycategorizing mixed target nucleic acid molecules into their properoperational taxonomic unit (OTU) designations. Such methods can providecomprehensive prokaryotic or eukaryotic identification, and thuscomprehensive biosignature characterization.

In some embodiments, the selected OTU-specific oligonucleotide probe isused to calculate the relative abundance of one or more organisms thatbelong to a specific OTU at differing levels of taxonomicidentification. In some embodiments, an array or collection ofmicroparticles comprising at least one organism-specific or OTU-specificoligonucleotide probe selected by the method disclosed herein isprovided to infer specific microbial community activities. For example,the identity of individual taxa in a microbial consortium from ananaerobic environment for instance, a marsh, can be determined alongwith their relative abundance. If the consortium is suspected ofharboring microorganisms capable of butanol fermentation, then afterproviding a suitable feedstock in an anaerobic environment if theproduction of butanol is noted, then those taxa responsible for butanolfermentation can be inferred by the microorganisms that have abundantquantities of 16S rRNA. The present application provides methods tomeasure taxa abundance based on the detection of directly labeled 16SrRNA.

In some embodiments, multiple probes are selected for increasing theconfidence level and/or sensitivity level of identification of aparticular organism or OTU. The use of multiple probes can greatlyincrease the confidence level of a match to a particular organism. Insome embodiments, the selected organism-specific oligonucleotide probesare clustered and aligned into groups of similar sequence such thatdetection of an organism is based on 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 35 or more oligonucleotide probe matches. In some embodiments, theoligonucleotide probes are specific for a species. In other embodiments,the oligonucleotide probe recognizes related organisms such as organismsin the same subgenus, genus, subfamily, family, sub-order, order,sub-class, class, sub-phylum, phylum, sub-kingdom, or kingdom.

Perfect match (PM) probes are perfectly complementary to the targetpolynucleotide, e.g., a sequence that identifies a particular organismor OTU. In some embodiments, a system comprises mismatch (MM) controlprobes. Usually, MM probes are otherwise identical to PM probes, butdiffer by one or more nucleotides. Probes with one or more mismatch canbe used to indicate non-specific binding and a possible non-match to thetarget sequence. In some embodiments, the MM probes have one mismatchlocated in the center of the probe, e.g., in position 13 for a 25merprobe. The MM probe is scored in relation to its corresponding PM probeas a “probe pair.” MM probes can be used to estimate the backgroundhybridization, thereby reducing the occurrence of false positive resultsdue to non-specific hybridization, a significant problem with manycurrent detection systems. If an array is used, such as an Affymetrixhigh density probe array or Illumina bead array, ideally, the MM probeis positioned adjacent or close to its corresponding PM probe on thearray. Sample PM and MM probes are provided as SEQ ID NOs: 1-50.

In general, each MM probe differs from its corresponding PM probe by atleast one nucleotide. In some embodiments, the MM probe differs from itscorresponding PM probe by 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides.Within a MM probe, the mismatched nucleotide or nucleotides can includeany of the 3 central bases that are not found in the same position orpositions in the PM probe. For example, with a 25mer PM probe that has aguanine at the 13^(th) position, i.e., the central nucleotide, the MMprobes comprise probes with adenine, thymine, uracil or cytosine at the13^(th) position. Similarly, with a 25mer PM probe with an adenine atthe 12^(th) nucleotide position and a guanine at the 13^(th) nucleotideposition when read from the 3′ direction, the possible MM probescomprise probes with guanine at the 12^(th) nucleotide and adenine,thymine or cytosine at the 13^(th) nucleotide position; cytosine at the12^(th) nucleotide position and adenine, thymine or cytosine at the13^(th) nucleotide position; and thymine at the 12^(th) nucleotideposition and adenine, thymine or cytosine at the 13^(th) nucleotideposition. In some embodiments, the mismatched nucleotide or nucleotidesinclude any one or more of the nucleotides in a corresponding PM probe.Increasing the number of MM probes and/or the mis-match positionsrepresented can be used to enhance quantification, accuracy, andconfidence.

Some embodiments relate to a method of selecting and/or utilizing a setof oligonucleotide probes that enable simultaneous identification ofmultiple prokaryotic taxa with a relatively high confidence level.Typically, the confidence level of identification is at least 90%, 91%,92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5%. In general, an OTUrefers to an individual species or group of highly related species thatshare an average of at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99% 99.5%, or more sequence homology in a highlyconserved region. Multiple MM probes can be utilized to enhance thequantification and confidence of the measure. In some embodiments, eachinterrogation probe (e.g. PM probe) of a plurality of interrogationprobes has from about 1 to about 20 or more corresponding mismatchcontrol probes, such as from 2 to 20, 3 to 20, 4 to 20, 5 to 20, 6 to20, 7 to 20, 8 to 20, 9 to 20, 10 to 20, 11 to 20, 12 to 20, 13 to 20,14 to 20, 15 to 20, 16 to 20, 17 to 20, 18 to 20, or 19 to 20corresponding mismatch control probes. In further embodiments, eachinterrogation probe has from about 1 to about 10, about 1 to about 5,about 1 to 4, 1 to 3, 2 or 1 corresponding mismatch probes. In someembodiments, each interrogation probe has about 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more correspondingmismatch probes. These interrogation probes target unique regions withina target nucleic acid sequence, e.g., a 16S rRNA gene, and provide themeans for identifying at least about 10, 20, 50, 100, 500, 1000, 2000,5000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000,100000, 250000, 500000, or 1000000 taxa. In some embodiments, multipletargets can be simultaneously assayed or detected in a single assaythrough a high-density oligonucleotide probe system. The sum of alltarget hybridizations is used to identify specific prokaryotic taxa. Theresult is a more efficient and less time consuming method of identifyingunculturable or unknown organisms. The present application can alsoprovide results that could not previously be achieved, e.g., providingresults in hours where other methods would require days. In someembodiments, a microbiome (i.e., sample) can be assayed to determine theidentity and optionally the abundance of its constituent microorganismsin less than 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5,4, 3, 2, or 1 hour.

In some embodiments, the set of OTU-specific oligonucleotide probescomprises from about 1 to about 500 probes for each taxonomic group. Insome embodiments, the probes are proteins including antibodies, ornucleic acid molecules including oligonucleotides or fragments thereof.In some embodiments, an oligonucleotide probe corresponds to anucleotide fragment of the target nucleic acid molecule. In someembodiments, from about 1 to about 500, about 2 to about 200, about 5 toabout 150, about 8 to about 100, about 10 to about 35, or about 12 toabout 30 oligonucleotide probes can be designed for each taxonomicgrouping. In other embodiments, a taxonomic group can have at least 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 40, or moreprobes. In some embodiments, various taxonomic groups can have differentnumbers of probes, while in other embodiments, all taxonomic groups havea fixed number of probes per group. Multiple probes in a taxonomic groupcan provide additional data that can be used to make a determination,also known as “making a call” as to whether an OTU is present or not.Multiple probes also allow for the removal of one or more probes fromthe analysis based on insufficient signal strength, cross hybridizationor other anomalies. Removing probes can increase the confidence level ofresults and further allow for the detection of low abundantmicroorganisms. The oligonucleotide probes can each be from about 5 toabout 100 nucleotides, from about 10 to about 50 nucleotides, from about15 to about 35 nucleotides, or from about 20 to about 30 nucleotides. Insome embodiments, the probes are at least 5-mers, 6-mers, 7-mers,8-mers, 9-mers, 10-mers, 11-mers, 12-mers, 13-mers, 14-mers, 15-mers,16-mers, 17-mers, 18-mers, 19-mers, 20-mers, 21-mers, 22-mers, 23-mers,24-mers, 25-mers, 26-mers, 27-mers, 28-mers, 29-mers, 30-mers, 31-mers,32-mers, 33-mers, 34-mers, 35-mers, 36-mers, 37-mers, 38-mers, 39-mers,40-mers, 41-mers, 42-mers, 43-mers, 44-mers, 45-mers, 46-mers, 47-mers,48-mers, 49-mers, 50-mers, 51-mers, 52-mers, 53-mers, 54-mers, 55-mers,56-mers, 57-mers, 58-mers, 59-mers, 60-mers, 61-mers, 62-mers, 63-mers,64-mers, 65-mers, 66-mers, 67-mers, 68-mers, 69-mers, 70-mers, 71-mers,72-mers, 73-mers, 74-mers, 75-mers, 76-mers, 77-mers, 78-mers, 79-mers,80-mers, 81-mers, 82-mers, 83-mers, 84-mers, 85-mers, 86-mers, 87-mers,88-mers, 89-mers, 90-mers, 91-mers, 92-mers, 93-mers, 94-mers, 95-mers,96-mers, 97-mers, 98-mers, 99-mers, 100-mers or combinations thereof.

Some embodiments provide methods of selecting multiple, confirmatory,organism-specific or OTU-specific probes to increase the confidence ofdetection. In some embodiments, the methods also select one or moremismatch (MM) probes for every perfect match (PM) probe to minimize theeffect of cross-hybridization by non-target regions. Theorganism-specific and OTU-specific oligonucleotide probes selected bythe methods disclosed herein can simultaneously identify thousands oftaxa present in an environmental sample and allow accurateidentification of microorganisms and their phylogenetic relationships ina community of interest. Systems that use the organism-specific andOTU-specific oligonucleotide probes selected by the methods disclosedherein and the computational analysis disclosed herein have numerousadvantages over rRNA gene sequencing techniques. Such advantages includereduced cost per microbiome analysis, and increased processing speed persample or microbiome from both the physical analysis and thecomputational analysis point of view. In general, the analysisprocedures are not adversely affected by chimeras, are not subject tocreating artificial phylotypes, and are not subject to barcode PCR bias.Additionally, quantitative standards can be run with a microbiome sampledisclosed herein.

Some embodiments provide a method for selecting and/or utilizing a setof OTU- or organism-specific oligonucleotide probes for use in ananalysis system or bead multiplex system for simultaneously detecting aplurality of organisms in a sample. The method targets known diversitywithin target nucleic acid molecules to determine microbial communitycomposition and establish a biosignature. The target nucleic acidmolecule is typically a highly conserved polynucleotide. In someembodiments, the highly conserved polynucleotide is from a highlyconserved gene, whereas in other embodiments the polynucleotide is froma highly conserved region of a gene with moderate or large sequencevariation. In further embodiments, the highly conserved region can be anintron, exon, or a linking section of nucleic acid that separates twogenes. In some embodiments, the highly conserved polynucleotide is froma “phylogenetic” gene. Phylogenetic genes include, but are not limitedto, the 5.8S rRNA gene, 12S rRNA gene, 16S rRNA gene-prokaryotic, 16SrRNA gene-mitochondrial, 18S rRNA gene, 23S rRNA gene, 28S rRNA gene,gyrB gene, rpoB gene, fusA gene, recA gene, coxl gene, and the nifDgene. With eukaryotes, the rRNA gene can be nuclear, mitochondrial, orboth. In some embodiments, the 16S-23S rRNA gene internal transcribedspacer (ITS) can be used for differentiation of closely related taxawith or without the use of other rRNA genes. For example, rRNA, e.g.,16S or 23S rRNA, acts directly in the protein assembly machinery as afunctional molecule rather than having its genetic code translated intoprotein. Due to structural constraints of 16S rRNA, specific regionsthroughout the gene have a highly conserved polynucleotide sequence;although, non-structural segments can have a high degree of variability.Probing the regions of high variability can be used to identify OTUsthat represent a single species level, while regions of less variabilitycan be used to identify OTUs that represent a subgenus, a genus, asubfamily, a family, a sub-order, an order, a sub-class, a class, asub-phylum, a phylum, a sub-kingdom, or a kingdom. The methods disclosedherein can be used to select organism-specific and OTU-specificoligonucleotide probes that offer a high level of specificity for theidentification of specific organisms, OTUs representing specificorganisms, or OTUs representing specific taxonomic group of organisms.The systems and methods disclosed herein are particularly useful inidentifying closely related microorganisms and OTUs from a background orpool of closely related organisms.

The probes selected and/or utilized by the methodologies disclosedherein can be organized into OTUs that provide an assay with asensitivity and/or specificity of more than 80%, 85%, 90%, 91%, 92%,93%, 94%, 95%, 96%, 97%, 98% or 99%. In some embodiments, sensitivityand specificity depends on the hybridization signal strength, number ofprobes in the OTU, the number of potential cross hybridizationreactions, the signal strength of the mismatch probes, if present,background noise, or combinations thereof. In some embodiments, an OTUcontaining one probe can provide an assay with a sensitivity andspecificity of at least 90%, while another OTU can require at least 20probes to provide an assay with sensitivity and specificity of at least90%.

Some embodiments relate to methods for phylogenetic analysis systemdesign and signal processing and interpretation for use in detecting andidentifying a plurality of biomolecules and organisms in a sample. Morespecifically, some embodiments relate to a method of selecting a set oforganism-specific oligonucleotide probes for use in detecting aplurality of organisms in a sample with a high confidence level. Someembodiments relate to a method of selecting a set of OTU-specificoligonucleotide probes for use in detecting a plurality of organisms ina sample with a high confidence level.

In the case of highly conserved polynucleotides like 16S rRNA that canhave only one to a few nucleotides of sequence variability over any 15-to 30-bp region targeted by probes for discrimination between relatedmicrobial species, it is advantageous to maximize the probe-targetsequence specificity in an assay system. Some embodiments providemethods of selecting organism-specific oligonucleotide probes thateffectively minimize the influence of cross-hybridization. In someembodiments, the method comprises: (a) identifying sequences of a targetnucleic acid molecule corresponding to the plurality of organisms; (b)determining the cross-hybridization pattern of a candidateorganism-specific oligonucleotide probe to the target nucleic acidmolecule from the plurality of organisms, wherein the candidateoligonucleotide probe corresponds to a sequence fragment of the targetnucleic acid molecule from the plurality of organisms; (c) determiningthe cross-hybridization pattern of a variant of the candidateorganism-specific oligonucleotide probe to the target nucleic acidmolecule from the plurality of organisms, wherein the variant of thecandidate organism-specific oligonucleotide probe comprises at least 1nucleotide mismatch compared to the candidate organism-specificoligonucleotide probe; and (d) selecting or rejecting the candidateorganism-specific oligonucleotide probe on the basis of thecross-hybridization pattern of the candidate organism-specificoligonucleotide probe and the cross-hybridization pattern of the variantof the candidate organism-specific oligonucleotide probe. In someembodiments, a method of selecting a set of OTU-specific oligonucleotideprobes for use in detecting a plurality of organisms in a sample isprovided. In some embodiments, the method comprises: (a) identifyingsequences of a target nucleic acid molecule corresponding to theplurality of organisms; (b) clustering the sequences of the targetnucleic acid molecule from the plurality of organisms into one or moreOperational Taxonomic Units (OTUs), wherein each OTU comprises one ormore groups of similar sequences; (c) determining thecross-hybridization pattern of a candidate OTU-specific oligonucleotideprobe to the OTUs, wherein the candidate OTU-specific oligonucleotideprobe corresponds to a sequence fragment of the target nucleic acidmolecule from one of the plurality of organisms; (d) determining thecross-hybridization pattern of a variant of the candidate OTU-specificoligonucleotide probe to the OTUs, wherein the variant of the candidateOTU-specific oligonucleotide probe comprises at least 1 nucleotidemismatch compared to the candidate OTU-specific oligonucleotide probe;and (e) selecting or rejecting the candidate OTU-specificoligonucleotide probe on the basis of the cross-hybridization pattern ofthe candidate OTU-specific oligonucleotide probe to the OTUs and thecross-hybridization pattern of the variant of the candidate OTU-specificoligonucleotide probe to the OTUs. In some embodiments, candidateOTU-specific oligonucleotide probe are rejected when the candidateOTU-specific oligonucleotide probe or its variant are predicted tocross-hybridize with other target sequences. In some embodiments, apredetermined amount of predicted cross-hybridization is allowed.

In some embodiments, selected oligonucleotide probes are synthesized byany relevant method known in the art. Some examples of suitable methodsinclude printing with fine-pointed pins onto glass slides,photolithography using pre-made masks, photolithography using dynamicmicromirror devices, ink jet printing, or electrochemistry. In oneexample, a photolithographic method can be used to directly synthesizethe chosen oligonucleotide probes onto a surface. Suitable examples forthe surface include glass, plastic, silicon and any other surfaceavailable in the art. In certain examples, the oligonucleotide probescan be synthesized on a glass surface at an approximate density fromabout 1,000 probes per μm² to about 100,000 probes per μm², preferablyfrom about 2000 probes per μm² to about 50,000 probes per μm², morepreferably from about 5000 probes per μm² to about 20,000 probes perμm². In one example, the density of the probes is about 10,000 probesper μm². The number of probes on the array can be quite large e.g., atleast 10⁵, 10⁶, 10⁷, 10⁸ or 10⁹ probes per array. Usually, for largearrays only a relatively small proportion (i.e., less than about 1%,0.1% 0.01%, 0.001%, 0.00001%, 0.000001% or 0.0000001%) of the totalnumber of probes of a given length target an individual OTU. Frequently,lower limit arrays have no more than 10, 25, 50, 100, 500, 1000, 5000,or 10000, 25000, 50000, 100000 or 250000 probes.

Typically, the arrays or microparticles have probes to one or morehighly conserved polynucleotides. The arrays or microparticles can havefurther probes (e.g. confirmatory probes) that hybridize to functionallyexpressed genes, thereby providing an alternate or confirmatory signalupon which to base the identification of a taxon. For example, an arraycan contain probes to 16S rRNA gene sequences from Yersinia pestis andVibrio cholerae and also confirmatory probes to Y. pestis cafl virulencegene or V. cholerae zonula occludens toxin (zot) gene. The detection ofhybridization signals based on probes binding to 16S rRNApolynucleotides associated with a particular OTU coupled with thedetection of a hybridization signal based on a confirmatory probe canprovide a higher level of confidence that the OTU is present. Forinstance, if hybridization signals are detected for the probesassociated Y. pestis OTU and the confirmatory probe also displays ahybridization signal for the expression of Y. pestis cafl then theconfidence level subscribed to the presence or quantity of Y. pestiswill be higher than the confidence level obtained from the use of OTUprobes alone.

A range of lengths of probes can be employed on the arrays ormicroparticles. As noted above, a probe can consist exclusively of acomplementary segments, or can have one or more complementary segmentsjuxtaposed by flanking, trailing and/or intervening segments. In thelatter situation, the total length of complementary segment(s) can bemore important that the length of the probe. In functional terms, thecomplementary segment(s) of the PM probes should be sufficiently long toallow the PM probes to hybridize more strongly to a targetpolynucleotide e.g., 16S rRNA, compared with a MM probe. A PM probeusually has a single complementary segment having a length of at least15 nucleotides, and more usually at least 16, 17, 18, 19, 20, 21, 22,23, 24, 25 or 30 bases exhibiting perfect complementarity.

In some arrays of microparticles, all probes are the same length. Inother arrays of microparticles, probe length varies betweenquantification standard (QS) probes, negative control (NC) probes, probepairs, probe sets (OTUs) and combinations thereof. For example, somearrays can have groups of OTUs that comprise probe pairs that are all 23mers, together with other groups of OTUs or probe sets that compriseprobe pairs that are all 25 mers. Additional groups of probes pairs ofother lengths can be added. Thus, some arrays can contain probe pairshaving sizes of 15 mers, 16mers, 17mers, 18mers, 19mers, 20mers, 21mers,22mers, 23mers, 24mers, 25 mers, 26mers, 27 mers, 28mers, 29 mers,30mers, 31mers, 32mers, 33mers, 34mers, 35mers, 36mers, 37mers, 38mers,39mers, 40mers or combinations thereof. Other arrays can have differentsize probes within the same group, OTU, or probe set. In these arrays,the probes in a given OTU or probe set can vary in length independentlyof each other. Having different length probes can be used to equalizehybridization signals from probes depending on the hybridizationstability of the oligonucleotide probe at the pH, temperature, and ionicconditions of the reaction.

In another aspect of the present application, a system is provided fordetermining the presence or quantity of a plurality of different OTUs ina single assay where the system comprises a plurality of polynucleotideinterrogation probes, a plurality of polynucleotide positive controlprobes, and a plurality of polynucleotide negative control probes. Insome embodiments, the system is capable of detecting the presence,absence, relative abundance, and/or quantity of at least about 5, 10,20, 50, 100, 250, 500, 1000, 5000, 10000, 20000, 30000, 40000, 50000,60000, 70000, 80000, 90000, 100000, 250000, 500000 or 1000000 OTUs in asample using a single assay. In some embodiments, the polynucleotidepositive control probes include 1) probes that target sequences ofprokaryotic or eukaryotic metabolic genes spiked into the target nucleicacid sequences in defined quantities prior to fragmentation, or 2)probes complimentary to a pre-labeled oligonucleotide added into thehybridization mix after fragmentation and labeling. The control addedprior to fragmentation collectively tests the fragmentation,biotinylation, hybridization, staining and scanning efficiency of thesystem. It also allows the overall fluorescent intensity to benormalized across multiple analysis components used in a single orcombined experiment, such as when two or more arrays are used in asingle experiment or when data from two separate experiments iscombined. The second control directly assays the hybridization, stainingand scanning of the system. Both types of control can be used in asingle experiment.

In some embodiments, the QS standards (positive controls) are PM probes.In other embodiments, the QS standards are PM and MM probe pairs. Infurther embodiments, the QS standards comprise a combination of PM andMM probe pairs and PM probes without corresponding MM probes. In someembodiments, the QS standards comprise at least one, two, three, four,five, six, seven, eight, nine, ten or more MM probes for eachcorresponding PM probe. In a further embodiment, the QS standardscomprise at least one, two, three, four, five, six, seven, eight, nine,ten or more PM probes for each corresponding MM probe. A system cancomprise at least 1 positive control probe for each 1, 10, 100, or 1000different interrogation probes.

In some cases, the spiked-in oligonucleotides that are complementary tothe positive control probes vary in G+C content, uracil content,concentration, or combinations thereof. In some embodiments, the G+C %ranges from about 30% to about 70%, about 35% to about 65% or about 40%to about 60%. QS standards can also be chosen based on the uracilincorporation frequency. The QS standards can incorporate uracil in arange from about 1 in 100 to about 60 in 100, about 4 in 100 to about 50in 100, or about 10 in 100 to about 50 in 100. In some cases, theconcentration of these added oligonucleotides will range over 1, 2, 3,4, 5, 6, or 7 orders of magnitude. Concentration ranges of about 10⁵ to10¹⁴, 10⁶ to 10¹³, 10⁷ to 10¹², 10⁷ to 10¹¹, 10 ⁸ to 10¹¹, and 10⁸ to10¹⁰ can be employed and generally feature a linear hybridization signalresponse across the range. In some embodiments, positive control probesfor the conduction of the methods disclosed herein comprisepolynucleotides that are complementary to the positive control sequencesshown in Table 1. Other genes that can be used as targets for positivecontrols include genes encoding structural proteins, proteins thatcontrol growth, cell cycle or reproductive regulation, and housekeepinggenes. Additionally, synthetic genes based on highly conserved genes orother highly conserved polynucleotides can be added to the sample.Useful highly conserved genes from which synthetic genes can be designedinclude 16S rRNA genes, 18S rRNA genes, 23SrRNA genes. Exemplary controlprobes are provided as SEQ ID NOs:51-100.

TABLE 1 Positive Control Sequences Positive Control ID DescriptionAFFX-BioB-5_at E. coli biotin synthetase AFFX-BioB-M_at E. coli biotinsynthetase AFFX-BioC-5_at E. coli bioC protein AFFX-BioC-3_at E. colibioC protein AFFX-BioDn-3_at E. coli dethiobiotin synthetaseAFFX-CreX-5_at Bacteriophage P1 cre recombinase protein AFFX-DapX-5_atB. subtilis dapB, dihydrodipicolinate reductase AFFX-DapX-M_at B.subtilis dapB, dihydrodipicolinate reductase YFL039C Saccharomyces, Genefor actin (Act 1p) protein YER022W Saccharomyces, RNA polymerase IImediator complex subunit (SRB4p) YER 148 W Saccharomyces, TATA-bindingprotein, general transcription factor (SPT15) YEL002C Saccharomyces,Beta subunit of the oligosaccharyl transferase (OST) glycoproteincomplex (WBP1) YEL024W Saccharomyces, Ubiquinol-cytochrome-c reductase(RIP1) Synthetic 16S rRNA controls SYNM neurolyt_st Synthetic derivativeof Mycoplasma neurolyticum 16S rRNA gene SYNLc.oenos_st Syntheticderivative of Leuconostoc oenos 16S rRNA gene SYNCau.cres8_st Syntheticderivative of Caulobacter crescenius 16S rRNA gene SYNFer.nodosm_stSynthetic derivative of Fervidobacterium nodosum 16S rRNA geneSYNSap.grandi_st Synthetic derivative of Saprospira grandis 16S rRNAgene

In some embodiments, the negative controls comprise PM and MM probepairs. In further embodiments, the negative controls comprise acombination of PM and MM probe pairs and PM probes without correspondingMM probes. In other embodiments, the negative control probes comprise atleast one, two, three, four, five, six, seven, eight, nine, ten or moreMM probes for each corresponding negative control PM probe. A system cancomprise at least 1 negative control probe for each 1, 10, 100, or 1000different interrogation probes (PMs).

In some embodiments, the negative control probes hybridize weakly, if atall, to 16S rRNA gene or other highly conserved gene targets. Thenegative control probes can be complementary to metabolic genes ofprokaryotic or eukaryotic origin. Generally, with negative controlprobes, no target material is spiked into the sample. In someembodiments, negative control probes are from the same collection ofprobes that are also used for positive controls, but no materialcomplementary to the negative control probes are spiked into the sample,in contrast to the positive control probe methodology. In essence, thecontrol probes can be universal control probes and play the role ofpositive or negative control probes depending on the system's design.One of skill in the art will appreciate that the universal controlprobes are not limited to highly conserved sequence analysis systems andhave applications beyond the present embodiments disclosed herein.

In a further embodiment, probes to non-highly conserved polynucleotidesare added to a system to provide species-specific identification orconfirmation of results achieved with the probes to the highly conservedpolynucleotides. Usually, these “confirmatory” probes cross hybridizevery weakly, if at all, to highly conserved polynucleotides recognizedby the perfect match probes. Useful species-specific genes includemetabolic genes, genes encoding structural proteins, proteins thatcontrol growth, cell cycle or reproductive regulation, housekeepinggenes or genes that encode virulence, toxins, or other pathogenicfactors. In some embodiments, the system comprises at least 1, 5, 10,20, 30, 40, 50 60, 70, 80, 90 100, 150, 200, 250, 300, 400, 500, 600,700, 800, 900, 1000, 5000 or 10000 species-specific probes.

In some embodiments, a system comprises an array. Non-limiting examplesof arrays include microarrays, bead arrays, through-hole arrays, wellarrays, and other arrays known in the art suitable for use inhybridizing probes to targets. Arrays can be arranged in any appropriateconfiguration, such as, for example, a grid of rows and columns. Someareas of an array comprise the OTU detection probes whereas other areascan be used for image orientation, normalization controls, signalscaling, noise reduction processing, or other analyses. Control probescan be placed in any location in the array, including along theperimeter of the array, diagonally across the array, in alternatingsections or randomly. In some embodiments, the control probes on thearray comprise probe pairs of PM and MM probes. The number of controlprobes can vary, but typically the number of control probes on the arrayrange from 1 to about 500,000. In some embodiments, at least 10, 100,500, 1000, 5000, 10000, 25000, 50000, 100000, 250000 or 500000 controlprobes are present. When control probe pairs are used, the probe pairswill range from 1 to about 250000 pairs. In some embodiments, at least5, 50, 250, 500, 2,500, 5,000, 12500, 25000, 50000, 125000 or 250000control probe pairs are present. The arrays can have other componentsbesides the probes, such as linkers attaching the probes to a support.In some embodiments, materials for fabricating the array can be obtainedfrom Affymetrix (Santa Clara, Calif.), GE Healthcare (Little Chalfont,Buckinghamshire, United Kingdom) or Agilent Technologies (Palo Alto,Calif.)

Besides arrays where probes are attached to the array substrate,numerous other technologies can be employed in the disclosed system. Insome embodiments, the probes are attached to beads that are then placedon an array as disclosed by Ng et al. (Ng et al. A spatially addressablebead-based biosensor for simple and rapid DNA detection. Biosensors &Bioelectronics, 23:803-810, 2008).

In some embodiments, probes are attached to beads or microspheres, thehybridization reactions are performed in solution, and then the beadsare analyzed by flow cytometry, as exemplified by the Luminexmultiplexed assay system. In this analysis system, homogeneous beadsubsets, each with beads that are tagged or labeled with a plurality ofidentical probes, are combined to produce a pooled bead set that ishybridized with a sample and then analyzed in real time with flowcytometry, as disclosed in U.S. Pat. No. 6,524,793. Bead subsets can bedistinguished from each other by variations in the tags or labels, e.g.,using variability in laser excitable dye content.

In some further embodiments, probes are attached to cylindrical glassmicrobeads as exemplified by the Illumina Veracode multiplexed assaysystem. Here, subsets of microbeads embedded with identical digitalholographic elements are used to create unique subsets of probe-labeledmicrobeads. After hybridization, the microbeads are excited by laserlight and the microbead code and probe label are read in real timemultiplex assay.

In some embodiments, a solution based assay system is employed asexemplified by the NanoString nCounter Analysis System (Geiss G et al.Direct multiplexed measurement of gene expression with color-coded probepairs. Nature Biotech. 26:317-325, 2008). With this methodology, asample is mixed with a solution of reporter probes that recognize uniquesequences and capture probes that allow the complexes formed between thenucleic acids in the sample and the reporter probes to be immobilized ona solid surface for data collection. Each reporter probe is color-codedand is detected through fluorescence.

In a further embodiment, branched DNA technology, as exemplified byPanomics QuantiGene Plex 2.0 assay system, is used. Branched DNAtechnology comprises a sandwich nucleic acid hybridization assay for RNAdetection and quantification that amplifies the reporter signal ratherthan the sequence. By measuring the RNA at the sample source, the assayavoids variations or errors inherent to extraction and amplification oftarget polynucleotides. The QuantiGene Plex technology can be combinedwith multiplex bead based assay system such as the Luminex systemdescribed above to enable simultaneous quantification of multiple RNAtargets directly from whole cells or purified RNA preparations.

Sample Sources and Nucleic Acid Preparation

In some embodiments, the sample used can be an ecosystems sample.Ecosystems include microbiomes associated with plants, animals, andhumans. Animal and human associated microbiomes include those found inthe gastrointestinal tract, respiratory system, nares, urogenital tract,mammary glands, oral cavity, auditory canal, feces, urine, and skin. Insome embodiments, the sample can be any kind of clinical or medicalsample. For example, samples from blood, urine, feces, nares, the lungs,the gut, other bodily fluids or excretions, materials derived therefrom,or combinations thereof of mammals can be assayed using the arraysystem. Also, the probes selected by the methods disclosed herein andthe array system of the present embodiments can be used to identify aninfection in the blood of an animal. The probes selected by the methodsdisclosed herein and the array system of the present embodiments canalso be used to assay medical samples that are directly or indirectlyexposed to the outside of the body, such as the lungs, ear, nose,throat, the entirety of the digestive system or the skin of an animal.In some embodiments, a sample includes cell culture samples and/orbacterial culture samples. In some embodiments, a sample comprises apulmonary sample from a subject, including but not limited to sputum,endotracheal aspirate, bronchoalveolar lavage sample, a swab of theendotrachea, materials derived therefrom, or combinations thereof.

Techniques and systems to obtain nucleic acids from multiple organismsin a sample, such as an ecosystem, medical, or clinical sample, are wellknown by persons skilled in the art. In some embodiments, a sample istreated with an agent that selectively modifies a nucleic acid of deadcells. In some embodiments, samples treated with an agent thatselectively modifies a nucleic acid of dead cells are treated with asecond agent, such as light (e.g. visible light), and optionally washedto remove excess selective agent before extracting nucleic acid. Manycommercially available DNA extraction and purification kits can be used.Samples with lower than 2 pg purified DNA can require amplification,which can be performed using conventional techniques known in the art,such as a whole community genome amplification (WCGA) method (Wu et al.,Appl. Environ. Microbiol. (2006) 72, 4931-4941). In some embodiments,highly conserved sequences such as those found in the 16S RNA gene, 23SRNA gene, 5S RNA gene, 5.8S rRNA gene, 12S rRNA gene, 18S rRNA gene, 28SrRNA gene, gyrB gene, rpoB gene, fusA gene, recA gene, coxl gene andnifD gene are amplified. Usually, amplification is performed using PCR,but other types of nucleic acid amplification can be employed.Generally, amplification is performed using a single pair of universalprimers specific to a highly conserved sequence. For redundancy or forincreased amount of total amplicon concentration, two or more universalprobe pairs each specific to a different highly conserved sequence canbe used. Representative PCR primers include: bacterial primers 27F and1492R. In some embodiments, a nucleic acid sample is amplified using acollection of primers each comprising one or more nucleotide positionsselected at random from two or more different nucleotides. In someembodiments, primers, nucleotides, or other reagents used in anamplification reaction are labeled to produced labeled amplificationproducts.

A gel electrophoresis method can also be used to isolate community RNA(McGrath et al., J. Microbiol. Methods (2008) 75:172-176). Samples withlower than 5 pg purified RNA may require amplification, which can beperformed using conventional techniques known in the art, such as awhole community RNA amplification approach (WCRA) (Gao et al., Appl.Environ. Microbiol. (2007) 73:563-571) to obtain cDNA. In someembodiments, sampling and DNA extraction are conducted as previouslydescribed (DeSantis et al., Microbial Ecology, 53(3):371-383, 2007).

In some embodiments, DNA; total RNA, or a fraction thereof, includingrRNA, 16S rRNA, and 23S rRNA; or combinations thereof are directlylabeled and used without any amplification.

Probe Preparations

Techniques and means for generating oligonucleotide probes to be used onanalysis systems, beads or in other systems are well-known by personsskilled in the art. For example, the oligonucleotide probes can begenerated by synthesis of synthetic polynucleotides or oligonucleotides,e.g., using N-phosphonate or phosphoramidite chemistries (Froehler etal., Nucleic Acid Res. 14:5399-5407 (1986); McBride et al., TetrahedronLett. 24:246-248 (1983)). Synthetic sequences are typically betweenabout 10 and about 500 bases in length, more typically between about 15and about 100 bases, and most preferably between about 20 and about 40bases in length. In some embodiments, synthetic nucleic acids includenon-natural bases, such as, but by no means limited to, inosine. Anexample of a suitable nucleic acid analogue is peptide nucleic acid(see, e.g., Egholm et al., Nature 363:566-568 (1993); U.S. Pat. No.5,539,083). In some embodiments, at least 10, 25, 50, 100, 500, 1,000,5,000, 10,000, 20,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000100,000, 200,000, 500,000, 1,000,000 or 2,000,000 probes are included onthe array. In further embodiments, each PM probe has one or morecorresponding MM probe present on the array. Typically, each PM-MM probepair is associated with an OTU. In some embodiments, at least 10, 25,50, 100, 500, 1,000, 5,000, 10,000, 20,000, 40,000, 50,000, 60,000,70,000, 80,000, 90,000 100,000, 200,000 or 500,000 probe pairs areplaced on the array. Generally, sets of probe pairs have at least 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 or 35 probe pairspresent.

In some embodiments, positive control probes that are complementary toparticular sequences in the target sequences (e.g., 16S rRNA gene) areused as internal quantification standards (QS) and included in thesystem. In other embodiments, positive control probes, also known asinternal DNA quantification standards (QS) probes are probes thathybridize to spiked-in nucleic acid sequence targets. Usually, thesequences are from metabolic genes. In some embodiments, negativecontrol (NC) probes, e.g., probes that are not complementary or do notappreciably hybridize to sequences in the target sequences (e.g., 16SrRNA gene) are included on the array. Unlike the QS probes, no targetmaterial is spiked into the sample mix for the NC probes, prior tosample processing.

In some embodiments, the probes are synthesized separately and thenattached to a solid support or surface, which can be made, e.g., fromglass, latex, plastic (e.g., polypropylene, nylon, polystyrene),polyacrylamide, nitrocellulose, gel, silicon, or other porous ornonporous material. In some embodiments, the surface is spherical orcylindrical as in the case of microbeads or rods. In other embodiments,the surface is planar, as in an array or microarray.

Oligonucleotides produced using techniques known in the art can be builton and/or coupled to microspheres, beads, microbeads, rods, or othermicroscopic particles for use in arrays, flow cytometry, and othermultiplex assay systems. Numerous microparticles are commerciallyavailable from about 0.01 to 100 micrometers in diameter. Generally,microparticles from about 0.1-50 μm, about 1-20 μm, or about 3-10 μm arepreferred. The size and shapes of microparticles can be uniform or theycan vary. In some embodiments, sublots of different sizes, shapes orboth are conjugated to probes before combining the sublots to make afinal mixed lot of labeled microparticles. The individual sublots cantherefore be distinguished and classified based on their size and shape.The size of the microparticles can be measured in practically any flowcytometry apparatus by so-called forward or small-angle scatter light.The shape of the particle can be also discriminated by flow cytometry,e.g., by high-resolution slit-scanning method.

Microparticles can be made out of any solid or semisolid materialincluding glass, glass composites, metals, ceramics, or polymers.Frequently, the microparticles are polystyrene or latex material, butany type of polymeric material is acceptable including but not limitedto brominated polystyrene, polyacrylic acid, polyacrylonitrile,polyacrylamide, polyacrolein, polybutadiene, polydimethylsiloxane,polyisoprene, polyurethane, polyvinylacetate, polyvinylchloride,polyvinylpyridine, polyvinylbenzylchloride, polyvinyltoluene,polyvinylidene chloride, polydivinylbenzene, polymethylmethacrylate, orcombinations thereof. Microparticles can be magnetic or non-magnetic andcan also have a fluorescent dye, quantum dot, or other indicatormaterial incorporated into the microparticle structure or attached tothe surface of the microparticles. Frequently, microparticles can alsocontain 1 to 30% of a cross-linking agent, such as divinyl benzene,ethylene glycol dimethacrylate, trimethylol propane trimethacrylate, orN,N′methylene-bis-acrylamide or other functionally equivalent agentsknown in the art.

Embodiments disclosed herein are applicable for use in any analysissystem, including but not limited to bead or solution multiplex reactionplatforms, or across multiple platforms, for example, AffymetrixGeneChip® Arrays, Illumina BeadChip® Arrays, Luminex xMAP® Technology,Agilent Two-Channel Arrays, MAGIChips (Analysis systems ofGel-immobilized Compounds) or the NanoString nCounter Analysis System.The Affymetrix (Santa Clara, Calif., USA) platform DNA arrays can havethe oligonucleotide probes (approximately 25mer) synthesized directly onthe glass surface by a photolithography method at an approximate densityof 10,000 molecules per μm² (Chee et al., Science (1996) 274:610-614).Spotted DNA arrays use oligonucleotides that are synthesizedindividually at a predefined concentration and are applied to achemically activated glass surface. In general, oligonucleotide lengthscan range from a few nucleotides to hundreds of bases in length, but aretypically from about 10mer to 50mer, about 15mer to 40mer, or about20mer to about 30mer in length.

Target Labeling

In some embodiments, the nucleic acid targets are labeled so that alaser scanner tuned to a specific wavelength of light can measure thenumber of fluorescent molecules that hybridized to a specific DNA probe.For arrays, the nucleic acid targets are typically fragmented to between15 and 100 nucleotides in length and a biotinylated nucleotide is addedto the end of the fragment by terminal DNA transferase. At a laterstage, the biotinylated fragments that hybridize to the oligonucleotideprobes are used as a substrate for the addition of multiplephycoerythrin fluorophores by a sandwich (Streptavidin) method. For somearrays, such as those made by AGILENT or NIMBLEGEN, the purifiedcommunity DNA can be fluorescently labeled by random priming using theKlenow fragment of DNA polymerase and more than one fluorescent moietycan be used (e.g. controls could be labeled with Cy3, and experimentalsamples labeled with Cy5 for direct comparison by hybridization to asingle analysis system). Some labeling methods incorporate the molecularlabel into the target during an amplification or enzymatic step toproduce multiple labeled copies of the target.

In some embodiments, the detection system is able to measure themicrobial diversity of complex communities without PCR amplification,and consequently, without the inherent biases associated with PCRamplification. In some embodiments, nucleic acid from dead cells isselectively removed before detection, such that detection is directed todetermining the presence, absence, relative abundance, and/or quantityof live organisms in a sample, such as live members of one or more OTUs.Actively metabolizing cells typically contain about 20,000 or moreribosomes for protein assembly compared to quiescent or dead cells thathave few. In some embodiments, rRNA can be purified directly from asample and processed with no amplification step, thereby reducing oravoiding bias caused by preferential amplification of some sequencesover others. Thus, in some embodiments, the signal from the analysissystem can reflect the true number of rRNA molecules that are present inthe samples. This can be expressed as the number of cells multiplied bythe number of rRNA copies within each cell. The number of cells in asample can then be inferred by several different methods, such as, forexample, quantitative real-time PCR, or FISH (fluorescence in situhybridization.). Then the average number of ribosomes within each cellcan be calculated.

Hybridization

Hybridizations can be carried out under conditions well-known by personsskilled in the art. See Rhee et al. (Appl. Environ. Microbiol. (2004)70:4303-4317) and Wu et al. (Appl. Environ. Microbiol. (2006)72:4931-4941). The temperature can be varied to reduce or increasestringency and allow the detection of more or less divergent sequences.Robotic hybridization and stringency wash stations can be used to givemore consistent results and reduce processing time. In some embodiments,the hybridization and washing process can be accomplished in less thanabout half an hour, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours,7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 14 hours, 16hours, 18 hours, 20 hours or 24 hours. Generally, hybridization andwashing times are reduced for microparticle based detection systemsowing to the greater accessibility of the probes to the targetmolecules. Generally, hybridization times can be reduced for lowcomplexity assays and/or assays for which there is an excess of targetanalytes.

Signal Quantification

After hybridization, arrays can be scanned using any suitable scanningdevice. Non-limiting examples of conventional microarray scannersinclude GeneChip Scanner 3000 or GeneArray Scanner, (Affymetrix, SantaClara, Calif.); and ProScan Array (Perkin Elmer, Boston, Mass.); and canbe equipped with lasers having resolutions of 10 pm or finer. Thescanned image displays can be captured as a pixel image, saved, andanalyzed by quantifying the pixel density (intensity) of each spot onthe array using image quantification software (e.g., GeneChip Analysissystem Analysis Suite, version 5.1 Affymetrix, Santa Clara, Calif.; andImaGene 6.0, Biodiscovery Inc. Los Angeles, Calif., USA). For eachprobe, an individual signal value can be obtained through imagingparsing and conversion to xy-coordinates. Intensity summaries for eachfeature can be created and variance estimations among the pixelscomprising a feature can be calculated.

With flow cytometry based detection systems, a representative fractionof microparticles in each sublot of microparticles can be examined. Theindividual sublots, also known as subsets, can be prepared so thatmicroparticles within a sublot are relatively homogeneous, but differ inat least one distinguishing characteristic from microparticles in anyother sublot. Therefore, the sublot to which a microparticle belongs canreadily be determined from different sublots using conventional flowcytometry techniques as described in U.S. Pat. No. 6,449,562. Typically,a laser is shined on individual microparticles and at least three knownclassification parameter values measured: forward light scatter (C₁)which generally correlates with size and refractive index; side lightscatter (C₂) which generally correlates with size; and fluorescentemission in at least one wavelength (C₃) which generally results fromthe presence of fluorochrome incorporated into the labeled targetsequence. Because microparticles from different subsets differ in atleast one of the above listed classification parameters, and theclassification parameters for each subset are known, a microparticle'ssublot identity can be verified during flow cytometric analysis of thepool of microparticles in a single assay step and in real-time. For eachsublot of microparticles representing a particular probe, the intensityof the hybridization signal can be calculated along with signal varianceestimations after performing background subtraction.

Data Processing and Statistical Analysis

Simultaneous detection of at least 500, 1000, 5000, 10000, 20000, 30000,40000, 50000, 60000, or more taxa with a high level of confidence canincorporate techniques to de-convolute the signal intensity of numerousprobe sets into probability estimates. In some embodiments, the methods,compositions and systems enable detection in one assay the presence orabsence of a microorganism in a community of microorganisms, such as anenvironmental or clinical sample when the microorganism comprises lessthan 0.05% of the total population of microorganisms. In someembodiments, detection includes determining the quantity of themicroorganism, e.g., the percentage of the microorganism in the totalmicroorganism population. De-convolution techniques can include theincorporation of NC probe pairs into the analysis system and the use ofthe data to fit the hybridization signals from the QS probe pairs to thehybridization distribution of the NC probe pairs.

De-convolution techniques can allow the detection and quantification ofnucleic acids in a sample and by inference, the detection andquantification of microorganisms in a sample. In one aspect of thepresent application, a system is provided for determining the presenceor quantity of a microorganism in a sample comprising contacting asample with a plurality of probes, detecting the hybridization signalsof the sample nucleic acids with the probes and de-convoluting thesignals to determine the presence, absence and/or quantity of aparticular nucleic acid present in a population of nucleic acids wherethe particular nucleic acid is present at less than 0.01% of the totalnucleic acid population. In some embodiments, the particular nucleicacid is at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96% or 97%homologous to other nucleic acids in the population.

In some embodiments, the data output from an imaged or scanned sample isde-convoluted and analyzed using the following methods. Using an arrayas an illustrative example, the hybridization signals are converted toxy-coordinates with intensity summaries and variance estimates generatedfor the pixels using commercial software. The data is outputted using astandard data format like a CEL file (Affymetrix), or a Feature Reportfile (NimbleGen).

The hybridization signals undergo background subtraction. Typically, thebackground intensity is computed independently for each quadrant as theaverage signal intensity of the least intense 2% of the probes in thequadrant. Other threshold values can also be used, e.g., 0.5%, 1%, 3%,4%, 5% or 10%. Background intensity is then subtracted from all probesin a quadrant before further computation is performed. This noiseremoval procedure can be done on a quadrant-by-quadrant basis or acrossa whole array.

In some embodiment, array signals are normalized to allow for thecomparison of results achieved in different experiments or for thecomparison of replicate experiments. Normalization can be achieved by anumber of methods. In some embodiments, reproducibility betweendifferent probes for the same target are evaluated using a PositionDependent Nearest Neighbor (PDNN) model as described in Zhang L. et al.,A model of molecular interactions on short oligonucleotide analysissystems, Nat. Biotechnol. 2003, 21(7):818-821. The PDNN model allowsestimation of the sequence specific noise signal and a non-specificbackground signal, and thus enables estimation of the true intensity forthe probes.

In other embodiments, per-array models of signal and backgrounddistributions using responses observed from comparison of the PM and MMprobe pairs and the internal DNA quantification standards (QS) probepairs are created. In some embodiments, the probability that each probepair is “positive” is determined by calculating a difference score, d,for each probe pair. d can be defined as:

$\begin{matrix}{d = {1 - ( \frac{{PM} - {MM}}{{PM} + {MM}} )}} & {{Eqn}.\mspace{14mu} 1}\end{matrix}$

wherein:

PM=scaled intensity of the perfect match probe;

MM=scaled intensity of the mismatch probe; and,

d=pair difference score.

The value of d can range from 0 to 2. When PM>>MM, the value of dapproaches 0; when PM=MM, d=1; and when PM<<MM, the value of dapproaches 2.

In some embodiments, the internal DNA quantification standards (QS) andnegative control (NC) probe pairs are binned and sorted by attributes ofthe probes. Examples of the attributes of the probes that can be used inthe embodiments disclosed herein include, but are not limited to bindingenergy; base composition, including A+T count, G+C count, and T count;sequence complexity; cross-hybridization binding energy; secondarystructure; hair-pin forming potential; melting temperature; and lengthof the probe. These attributes of the probes can affect hybridizationproperties of the probes, for example, A+T count may affect hydrogenbonding of the probe, and T count may affect the length and basecomposition of the fragments produced by the use of DNase. Fragmentationwith other enzyme systems may be influenced by the composition of otherbases.

In some embodiments, QS and NC probe pairs are binned and sorted basedon the individual probe's A+T count and T count. For each bin (A+T countby T count), the d values from the negative control probes are fit to anormal distribution to derive the scale (mean) and shape (standarddeviation). Then, the d values from QS are fit to a gamma distributionto derive scale and shape. For each array, multiple density plots areproduced by this process. In general, even one extra T can result inappreciable difference in the probe gamma scale parameter.

The parameters derived from gamma and normal distributions are used toderive a pair response score, r, for each probe pair. r is an indicatorof the probability that a probe pair is positive, i.e., the probabilityfor a probe pair to be responsive to the target sequence. r may bedefined as:

$\begin{matrix}{r = ( \frac{{pdf}_{\gamma}( {X = d} )}{{{pdf}_{\gamma}( {X = d} )} + {{pdf}_{norm}( {X = d} )}} )} & {{Eqn}.\mspace{14mu} 2}\end{matrix}$

where:

r=response score to measure the potential that a specific probe pair isbinding a target sequence and not a background signal, i.e. theprobability of the probe pair being positive for the specific targetsequence;pdf_(γ) (X=d)=probability that d could be drawn from the gammadistribution estimated for the target class ATx Ty;pdf_(norm) (X=d)=probability that d could be drawn from the normaldistribution estimated for the target class ATx Ty.r can range from 0 to 1. r approaches 1 when PM>>MM, and r approaches 0when PM<<MM.

Each set of interrogation probe pairs, e.g., an OTU, can be scored basedon pair response scores, cross-hybridization relationships or both. Insome embodiments, the system removes data from at least a subset ofprobe pair sets before making a final call on the presence or quantityof said microorganisms. In some embodiments, the data is removed basedon interrogation probe cross hybridization potential. In someembodiments, the scoring of probe pairs is performed by a two-stageprocess as discussed below.

For example, a two stage analysis can be performed wherein only probepairs that pass a first stage are analyzed in the next stage. In thefirst stage, the distribution of r across each set of probe pairs, R, isdetermined. For each set of probe pairs that is associated with an OTU,the r values of all probe pairs are ranked within the set, andpercentage of probe pairs that meet one or more threshold r values aredetermined. Frequently, three threshold determinations are made at 25%increments across the total range of ranked probe pairs (interquartileQ1, Q2, and Q3); however, any number of threshold determinations orpercentage increments can be used. For example, a determination can useone increment at 70% in which probe pairs must pass a threshold value of80%.

Typically, to differentiate signal from noise, an OTU is considered topass Stage 1 if Q1, Q2, and Q3 of the set of probe pairs that isassociated with this OTU surpass the threshold of Q1_(min), Q2_(min),and Q3_(min), respectively. That is, for an OTU to pass Stage 1, the rvalue of 75% of the probe pairs in the set of probe pairs that isassociated with that OTU has to be at least Q1_(min), the r value of 50%of the probe pairs in that set of probe pairs have to be at leastQ2_(min), and the r value of 25% of the probe pairs in that set of probepairs have to be at least Q3_(min). Q1_(min) is at least about 0.5,about 0.55, about 0.6, about 0.65, about 0.7, about 0.75, about 0.8,about 0.82, about 0.84, about 0.86, about 0.88, about 0.90, about 0.91,about 0.92, about 0.93, about 0.94, about 0.95, about 0.96, about 0.97,about 0.98, or about 0.99. Q2_(min) is at least about 0.5, about 0.55,about 0.6, about 0.65, about 0.7, about 0.75, about 0.8, about 0.82,about 0.84, about 0.86, about 0.88, about 0.90, about 0.91, about 0.92,about 0.93, about 0.94, about 0.95, about 0.96, about 0.97, about 0.98,or about 0.99. Q3_(min) is at least about 0.5, about 0.55, about 0.6,about 0.65, about 0.7, about 0.75, about 0.8, about 0.82, about 0.84,about 0.86, about 0.88, about 0.90, about 0.91, about 0.92, about 0.93,about 0.94, about 0.95, about 0.96, about 0.97, about 0.98, about 0.99,about 0.992, about 0.994, about 0.996, about 0.998, or about 0.999. Insome embodiments, Q1_(min), Q2_(min), and Q3_(min) are determinedempirically from spike-in experiments. For example, Q1_(min), Q2_(min),and Q3_(min) are chosen to allow 2 pM amplicon concentration to pass. Insome embodiments, Q1_(min), Q2_(min), and Q3_(min) are 0.98, 0.97, and0.82, respectively. These threshold numbers were empirically derivedusing DNase to fragment the sample sequences. Since DNase has a T-bias,the use of other enzymes can require a shift in the threshold numbersand can be empirically derived.

In the second stage only the OTUs passing the first are considered aspotential sources of cross-hybridization. In some embodiments, for eachOTU, only probe-pairs with r>0.5 (these are the probe pairs consideredas to be likely responsive to the target sequence) are further analyzed.In other instances, only probe pairs with r>0.6, 0.7, 0.8, or 0.9 areconsidered responsive and are further analyzed. Probe pairs that areunlikely to be responsive (i.e., r<0.5) are not analyzed further even iftheir set R, was responsive overall. R_(0.5) represents the subset ofprobe pairs in which all probe pairs have r>0.5. Typically, based on theinterquartile Q1, Q2 and Q3 values chosen at Stage 1, most of the probepairs in the OTUs passing Stage 1 are analyzed. In other embodiments,only the probe-pairs with r>0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, or0.90 are further analyzed.

For each probe pair in the R_(0.5) subset, the count of putativelycross-hybridizing OTUs (i.e., the number of OTUs with which the probepair can cross-hybridize) is determined. In this process, only the OTUsthat have passed Stage 1 are considered as potential sources ofcross-hybridization. Each probe pair in the R_(0.5) subset is penalizedby dividing its r value by the count of putatively cross-hybridizingOTUs to determine its modified possibility of being positive. Themodified possibility of being positive for a probe pair may berepresented by a r_(x) value. r_(x) may be defined as:

$\begin{matrix}{r_{x} = \frac{r}{{scalarS}_{1x}}} & {{Eqn}.\mspace{14mu} 3}\end{matrix}$

where

S₁=Set of OTUs passing Stage1; and,

S_(1x)=Set of OTUs passing Stage 1 with cross hybridization potential tothe given probe pair.

r_(x) is proportional to the response of the probe pair and thespecificity of the probe pair given the community observed during thefirst stage. r_(x) value can range from 0 to 1. For each set of probepairs associated with an OTU, r_(x) are calculated for each probe pairand ranked within the set. Interquartile Q1, Q2, Q3 values for thedistribution of r_(x) value in each set of probe pairs are determined.The taxon represented by the OTU is considered to be present if Q1 isgreater than Q_(x1), Q2 is greater than Q_(x2), or Q3 is greater thanQ_(x3). Q_(x1) is at least about 0.5, at least about 0.55, at leastabout 0.6, at least about 0.65, at least about 0.7 at least about 0.75,at least at least about 0.8, at least about 0.85, at least about 0.90,at least about 0.95, or at least about 0.97. Q_(x2) is at least about0.5, at least about 0.55, at least about 0.6, at least about 0.65, atleast about 0.7 at least about 0.75, at least at least about 0.8, atleast about 0.85, at least about 0.90, at least about 0.95, or at leastabout 0.97. Q_(x3) is at least about 0.5, at least about 0.55, at leastabout 0.6, at least about 0.65, at least about 0.7 at least about 0.75,at least at least about 0.8, at least about 0.85, at least about 0.90,at least about 0.95, or at least about 0.97. In some embodiments, Q_(x1)is at least 0.66, that is, 75% of the probe pairs in the set of theprobe pairs have a r_(x) value that is at least 0.66.

A two stage hybridization signal analysis procedure can be performed onhybridization signals from any array or microparticle generated dataset, including data generated from the use of any combination of probesselected using the disclosed methodologies. In some embodiments, thesecond stage of the procedure penalizes probes based on the number ofcross-hybridizations, the intensity of the cross-hybridization signalsor a combination of the two.

The method disclosed herein is useful for hierarchical probe setscoring. An OTU may be present at a node at any hierarchical level on aclustering tree. As used herein, an OTU is a group of one or moreorganisms, such as a domain, a sub-domain, a kingdom, a sub

kingdom, a phylum, a sub-phylum, a class, a sub-class, an order, asub-order, a family, a subfamily, a genus, a subgenus, a species, or anycluster. In some embodiments, a R_(0.5) set is collected for each nodeon the phylogenetic tree and consists of all unique probes fromsubordinate R_(0.5) sets. For example, for calculating r_(x) values forprobe pairs in a R_(0.5) set for an OTU representing an “order,” thecount of putatively cross-hybridizing equally-ranked taxa (i.e., “order”node) containing at least one sequence with cross-hybridizationpotential is used as the denominator in Eqn. 3.

In some embodiments, the OTUs at the leaf level (e.g., species, subgenusor genus) are first analyzed. Then each successive level of nodes in theclustering tree is analyzed. In some embodiments, the analysis isperformed up to the domain level. In some embodiments, the analysis isperformed up to the phylum level. In some other embodiments, theanalysis is performed up to the kingdom level. Penalization forcross-hybridization in Eqn. 3 is only performed for probes on the sametaxonomy level. All present taxa are quantified using the mean scaled PMprobe intensity after discarding the highest and lowest value of the setR (HybScore). In some embodiments, only taxa present at a first levelare analyzed further.

In some embodiments, a summary abundance score is determined. Correctedabundance scores are created based on G+C content and uracilincorporation. Generally, probes with higher G+C content produce ahigher hybridization signal that is typically compensated for correctingthe abundance scores.

The probability of detection for each taxonomic node is determined bysummarizing terminal node detection and the breadth ofcross-hybridization relationships. Hierarchical probes are scored forevidence of novel organisms based on cluster analysis.

In some embodiments, the system is capable of analyzing other data inconjunction with that obtained from the analysis of probe hybridizationsignal strength. In some embodiments, the system can analyze sequencingreaction data including that obtained with high-through put sequencingtechniques. In some embodiments, the sequencing data is from sameregions of the same highly conserved sequence analyzed by the methoddisclosed herein using probes.

High Capacity Analysis System Applications

Numerous subject-derived samples can be assayed to determine thesample's microbiome composition. By having an assay system capable ofdetecting in a single assay the presence and optionally quantity of atleast 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000,90,000, 100,000, 200,000, 500,000 or 1,000,000 bacterial or archealtaxa, a complete picture of a microbiotic ecosystem, such as the livingmicroorganisms in the ecosystem, can be achieved quickly and atrelatively low cost providing the ability to examine numerous subjects.

The elucidation of a specific microbiome associated with an ecosystem,animal, human, organ system, condition, and the like allows for thegeneration of a “signature,” “biosignature,” or “fingerprint” of theparticular environment sampled, terms used interchangeably herein. Ifthe biosignature is from a normal or healthy system or subject, or isfrom a subject free from a condition under examination, then theassociated biosignature can be used as a reference for the comparison oflater samples from the same or other subjects to monitor for changesthat are associated with an abnormal or unhealthy state or condition.For example, if a later biosignature of a subject shows that themicrobiome has shifted away from that associated with a healthypulmonary status, then preemptive measures could be taken to prevent acontinued shift.

Similarly, a biosignature of an environment can be compared to abiosignature generated from a pool of samples that represent an averageor normal biosignature for a population or collection of environments.For example, a sample from an unhealthy individual or environment couldbe assayed and the microbial biosignature compared to the biosignatureseen in a healthy population at large or unaffected environment. If oneor more microorganisms are detected in the unhealthy individual that areeither not seen in the general population or not seen at the sameprevalence then therapeutic measures can be taken to selectivelyeliminate or reduce in number the microorganisms associated with theunhealthy state. Once a relationship is known between the prevalence ofa particular microorganism or group of microorganisms (e.g. one or moreOTUs that consist essentially of viable organisms from a sample) and adisease state, then disease progression or treatment response can alsobe monitored, diagnosed, and/or predicted using the present systems andmethods.

Numerous microbiomes of animals or humans can be analyzed with thepresent systems and methods including the gut, respiratory system,urogenital tract, mammary glands, skin, oral cavity, auditory canal, andskin. Clinical samples such as blood, sputum, nares, feces, and urinecan be used with the method. From the analysis of normal individuals andthose suffering from a disease or condition, a large database offingerprints or biosignatures can be assembled. By comparing thebiosignatures between healthy and disease related states, associationscan be made as to the influence and importance of individual componentsof the microbiome.

Once these associations are made, treatments can be designed and testedto alter the composition of the microbiota seen in the disease state.Additionally, by regularly monitoring the microbial composition of anaffected organ system in a diseased individual, disease progress orresponse to therapy can be observed and if need, additional therapeuticmeasures taken to alter the microbiome composition to one that is morerepresentative of that seen in a healthy population.

An interesting property of bacteria that has great importance inhealthcare, water quality and food safety is quorum sensing. Manybacteria are able to sense the presence of other members of theirspecies or related species and upon reaching a specific density thebacteria start producing various virulence or pathogenicity factors. Inother words, the bacteria's gene expression is coordinated as a group.For example, some bacteria produce exopolysaccharides that are known as“slime layers.” The secretion of exopolysaccharidse can decrease theability of white blood cells to phagocytize the microorganisms and makethe microorganisms more resistant to therapeutics or cleaning agents.Traditional methodologies require the detection of specific geneexpression in order to detect or study quorum sensing and otherpopulation induced effects. The present systems and methods can be usedto understand the changes that occur in a microbiome that are associatedwith a given effect such as biofilm formation or toxicity production.One can develop protocols with the present systems and methods to lookfor and determine conditions that lead to quorum sensing. For example,testing samples at various timepoints and under varying conditions canlead to determining how and when to intervene or reverse populationinduced expression of virulence or pathogenicity factors.

In some embodiments, a method is provided to identify a new indicatorspecies for an environmental or health condition with the presentsystems and methods. The condition can be that of a normal or healthystate. Alternatively, the indicator species can be for an unhealthy orabnormal condition. To identify a new indicator species, a normal sampleis simultaneously assayed to determine the presence or quantity of eachOTU associated with all known bacteria, archae, or fungi; this testresult is compared to the results achieved in the simultaneous assay ofsample from the environment of the condition where the presence orquantity of each OTU associated with all known bacteria, archae, orfungi was determined. Microorganisms that change in abundance at least2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 20-fold, 50-fold or 100-fold,either increasing in abundance or decreasing in abundance representputative indicator species for a condition.

In other embodiments, methods are provided for identifying indicatorsspecies associated with a disease state, disease progression, treatmentregimen, probiotic administration, including progression of disease. Insome embodiments, methods are provided for monitoring a change in theenvironment or health status associated with introducing one or more newmicroorganisms into a community. For example, measures to increase aparticular microorganism's percentage of the gut microbiome in anindividual, such as feeding a person yogurt or a food supplementcontaining L. casei, can be monitored using the present methods andsystems. In some embodiments, the presence, absence, relative abundance,and/or quantity of desired and/or undesired live microorganisms in foodand food supplements can be evaluated by the methods disclosed herein.For example, the viability over time of microorganisms in a probioticsupplement can be determined and used as a measure of shelf-life and/orpotency of the probiotic supplement.

Combined Analysis

The ability to identify and quantitate the microorganisms in a sample(e.g. the live microorganisms in a sample comprising live and deadmicroorganisms) can be combined with a gene expression technology suchas a functional gene array to correlate populations with observed geneexpression. Similarly, microbiome composition analysis can be correlatedwith the presence of chemicals, proteins including enzymes, toxins,drugs, antibiotics or other sample constituents. For instance, nucleicacids isolated from a soil sample can be analyzed to elucidate themicrobiome composition (e.g. biosignature) and also to identifyexpressed genes. In the bare, nutrient-poor soils on the Antarctic, thisanalysis associated chitinase and mannanase expression withBacteroidetes and CH₄-related genes with Alphaproteobacteria. (Yergeauet al., Environmental microarray analyses of Antarctic soil microbialcommunities. ISME J. 3:340-351, 2009). Significant correlations werealso found between taxon abundances and C- and N-cycle gene abundance.From this data, one can predict that certain organisms or groups oforganisms are required or account for the majority of an expected orobserved enzymatic or degradative process. For example, members of theBacteroidetes phylum probably degrade the majority of environmentalchitin, a major constituent of exoskeletons of insect and arthropods andalso of fungi cell walls, at the sample locale.

This methodology can be used to identify new antibiotic producingorganisms, even ones that are unculturable. For instance, soil extractscan be tested for antibiotic activity. If a positive extract is found, asample of the soil from which a portion was extracted for antibiotic canbe analyzed for microbial composition and perhaps gene expression. Majorconstituents of the microbiome could be correlated with antibioticactivity with the correlation strengthened through gene expression dataallowing one to predict that a particular organism or group of organismsis responsible for the observed antibiotic activity.

In some embodiments, a method is provided for making a prediction abouta sample comprising a) determining microorganism population data as theprobability of the presence or absence of at least 100 OTUs ofmicroorganisms in said sample; b) determining gene expression data ofone or more genes by said microorganisms in said sample and c) usingsaid expression data and population data to make a prediction about saidsample. In some embodiments, the prediction entails the identity of amicroorganism responsible for a characteristic or condition observed inan environment.

Other combined analysis methods include the use of a diffusion chamberto retain microorganisms in a sample while one or more constituents orparameters of the sample are changed. For instance, the salinity or pHof the sample can be changed abruptly or gradually over time. Followingspecific time intervals, the microbiome of the sample in the diffusionchamber can be determined. Microorganisms that cannot tolerate the newenvironment conditions will die, become reduced in number due tounfavorable conditions or predation, or remain static in their numbers.In contrast, microorganisms that can tolerate the new conditions will atleast maintain their number or thrive, perhaps becoming a dominantpopulation. Use of a diffusion chamber coupled with a system capable ofdetecting the presence or quantity of at least 10,000 OTUs can allow theidentification of microorganisms that perish or fail to thrive whenplaced in a new environment. Such microorganisms are termed “transient”,meaning that their percent composition of the microbiome changesquickly. The identification of transient microorganisms can be used toascertain the time and/or place they were introduced into anenvironment. Different transient microorganisms can have differenthalf-lives for a particular condition.

Diffusion chambers can also take the form of a semi-permeable capsule,tube, rod, or sphere or other solid or semi-solid object. A microbiomeor a select group of bacteria can be placed inside the capsule, that isthen sealed and introduced into an environment for a specified period oftime. Upon removal, the capsule is opened and the microbiome or selectgroup of bacteria sampled to ascertain changes in the presence orquantity of the individual constituents. The capsule can be removed onceor periodically to sample the microbiome. Alternatively, multiple singleuse capsules with identical quantities of the microbiome can be used,each one removed and sampled at a different time point. Microbiomesplaced in capsules or other semi-permeable containers can be introducedinto a living organism, usually through an orifice, to measure changesto the microbiome composition associated with a particular organ orsystem environment. For example, a semi-permeable capsule or tubecontaining a microbiome can be introduced into the gastrointestinalsystem through the mouth or anus. A microbiome from a healthy individualcan be introduced in this manner into an unhealthy individual, such as apatient suffering from Crohn's disease or irritable bowel syndrome toascertain the effect of the unhealthy condition on the normal, healthyindividual associated microbiome. In this manner, the efficacy of drugeffectiveness and treatment protocols could also be evaluated based onthe effects of the gut ecology on a known microbiome.

Low Density-Special Purpose Detection Systems

In some embodiments, probes are selected for constructing specialpurpose systems including those with arrays or microparticles.Typically, special purpose “low density” systems, are designed for usein a specific environment or for a particular application and usuallyfeature a reduced number of probes, “down-selected” probes, that arespecific to organisms that are known or expected to be present in theparticular environment, such as associated with a particularbiosignature. In some cases the biosignature is fecal contamination.Typically, a low density system comprises no more than 10, 20, 50, 100,200, 500, 1,000, 2,000, 5,000 or 10,000 down selected probes or 5, 10,25, 50, 100, 250, 500, 1,000, 2,500 or 5,000 down selected probes probepairs (PM and MM probes). In some embodiments, only 1, 2, 3, 4, 5, 6, 7,8, 9, or 10 probes are used per OTU. In further embodiments, only PMprobes are used. Generally, these down-selected probes have robusthybridization signals and few or no cross hybridizations. In someembodiments, the collection of down selected probes have a median crosshybridization potential number of less than 20, 15, 10, 8, 7, 6, 5, 4,3, 2, or 1 per probe. Frequently the down selected probes belong to OTUsthat have reduced numbers of probes. In some embodiments, the OTUs of adown select probe collection have a median number of less than 25, 20,15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3 or 2 probes per OTU.Generally, low density systems feature probes that recognize no morethan 10, 25, 50, 100, 250, 500, 1,000, 2,000, or 5,000 taxa. For a setnumber of probes, a number of design strategies can be employed for lowdensity systems. One approach is to maximize the number of OTUsidentified, e.g., use one probe per OTU with no mismatch probes. Anotherapproach is to select probes based on the desired confidence level.Here, multiple probes for each OTU along with corresponding mismatchprobes can be required to achieve at least 95% confidence level for thepresence and quantity of each OTU. The probes for a particular lowdensity application can be selected by applying a sample from anappropriate environment to a high density analysis system, e.g., adetection system that can in a single assay determine the probability ofthe presence or quantity of at least 10,000, 20,000, 30,000, 40,000,50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 250,000, 500,00 or1,000,000 OTUs of a single domain, such as bacteria, archea, or fungi,or alternatively, for each known OTU of a single domain. Probesassociated with prevalent OTUs can be selected for a low density system.Alternately, the OTUs seen in a sample of interest can be compared witha control sample and shared OTUs subtracted out with the probesassociated with the remaining OTUs selected for the low density system.Additionally, probes can be selected based on a change in prevalence ofOTUs between the environment of interest and a control environment. Forexample, OTUs that are at least 2-fold 5-fold, 10-fold, 100-fold or1,000-fold more abundant in the sample of interest compared to thecontrol sample are included in the down selected probe set. Using thisinformation, a down selected array, bead multiplex system or other lowdensity assay system is designed.

“Low density” assays systems can be used to identify selectmicroorganisms and determine the percentage composition of variousselect microorganisms in relation to each other. Low density assaysystems can be constructed using probes selected through the disclosedmethodologies. These low density systems can identify at least 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 500,1000 or more microorganisms. Representative microorganisms to beidentified and optionally quantified are listed in Table 2.

TABLE 2 Representative Microorganisms Recognized by Low Density AssaySystems Species Application Listeria monocytogenes Food safety,environmental surveillance of food processing plants Salmonella entericaFood safety, environmental surveillance of food subsp. enterica serovarprocessing plants Enteritidis Pseudomonas aeruginosa Pulmonary health

Low density assays systems are useful for numerous environmental andclinical applications. Exemplary applications are listed in Table 2.Medical conditions that can be identified, diagnosed, prognosed,tracked, or treated based on data obtained with a low density systeminclude but are not limited to, cystic fibrosis, chronic obstructivepulmonary disease, Crohn's Disease, irritable bowel syndrome, cancer,rhinitis, stomach ulcers, colitis, atopy, asthma, neonatal necrotizingenterocolitis, obesity, periodontal disease and any disease or disordercaused by, aggravated by or related to the presence, absence orpopulation change of a microorganism. Through the judicious selection ofOTUs to be included in a system, the system becomes a diagnostic devicecapable of diagnosing one or more conditions or diseases with a highlevel of confidence producing very low rates of false positive or falsenegative readings.

In some embodiments, the low density systems also feature confirmatoryprobes that are specific (complimentary) for genes or sequencesexpressed in specific organisms. For example, the cafl virulence gene ofYersinia pestis and the zonula occludens toxin (zot) gene of Vibriocholerae and also confirmatory probes to Y. pestis or V. cholerae.

Kits

As used herein a “kit” refers to any delivery system for deliveringmaterials or reagents for carrying out a method disclosed herein. In thecontext of assays, such delivery systems include systems that allow forthe storage, transport, or delivery of arrays or beads with probes,reaction reagents (e.g., probes, enzymes, etc. in the appropriatecontainers) and/or supporting materials (e.g., buffers, writteninstructions for performing the assay etc.) from one location toanother. For example, kits include one or more enclosures (e.g., boxes)containing the relevant reaction reagents and/or supporting materialsfor assays disclosed herein.

In one aspect of the present application, kits for analysis of nucleicacid targets are provided. According to some embodiments, a kit includesa plurality of probes capable of determining the presence or quantityover 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 30000,40000 50000 or 60000 different OTUs in a single assay. Such probes canbe coupled to, for example, an array or plurality of microbeads. In someaspects a kit comprises at least 5, 10, 15, 20, 50, 100, 200, 500, 1000,2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000 or2000000 interrogation probes selected using the disclosed methodologiesand/or for use in the identification and/or comparison of a biosignatureof one or more samples.

The kit can also include reagents for sample processing, including oneor more agents described herein, alone or in combination. In someembodiments, the reagents comprise an agent that selectively modifiesnucleic acid of dead cells. In some embodiments, the reagents comprisereagents for the PCR amplification of sample nucleic acids includingprimers to amplify regions of a highly conserved sequence, such asregions of the 16S rRNA gene. In some embodiments, the reagents comprisereagents for the direct labeling of RNA, such as rRNA. In furtherembodiments, the kit includes instructions for using the kit. In otherembodiments, the kit includes a password or other permission for theelectronic access to a remote data analysis and manipulation softwareprogram. Such kits will have a variety of uses, including environmentalmonitoring, diagnosing disease, monitoring disease progress or responseto treatment, and identifying a contamination source and/or thepresence, absence, or amount of one or more contaminants.

Computer Implemented Methods

FIG. 7 illustrates an example of a suitable computing system environmentor architecture in which computing subsystems may provide processingfunctionality to execute software embodiments of the presentapplication, including probe selection, analysis of samples, and remotenetworking. The method or system disclosed herein may also operationalwith numerous other general purpose or special purpose computing systemincluding personal computers, server computers, hand-held or laptopdevices, multiprocessor systems, and the like.

The method or system may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. The method or system may also be practiced indistributed computing environments where tasks are performed by remoteprocessing devices that are linked through a communications network.

With reference to FIG. 7, an exemplary system for implementing themethod or system includes a general purpose computing device in the formof a computer 102.

Components of computer 102 may include, but are not limited to, aprocessing unit 104, a system memory 106, and a system bus 108 thatcouples various system components including the system memory to theprocessing unit 104.

Computer 102 typically includes a variety of computer readable media.Computer readable media includes both volatile and nonvolatile media,removable and non-removable media and a may comprise computer storagemedia. Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices.

The system memory 106 includes computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) 110and random access memory (RAM) 112. A basic input/output system 114(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 102, such as during start-up, istypically stored in ROM 110. RAM 112 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 104. FIG. 7 illustrates operatingsystem 132, application programs 134 such as sequence analysis, probeselection, signal analysis and cross-hybridization analysis programs,other program modules 136, and program data 138.

The computer 102 can also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 7 illustrates a hard disk drive 116 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 118that reads from or writes to a removable, nonvolatile magnetic disk 120,and an optical disk drive 122 that reads from or writes to a removable,nonvolatile optical disk 124 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment includemagnetic tape cassettes, flash memory cards, digital versatile disks,digital video tape, solid state RAM, solid state ROM, and the like. Thehard disk drive 116 is typically connected to the system bus 108 througha non-removable memory interface such as interface 126, and magneticdisk drive 118 and optical disk drive 122 are typically connected to thesystem bus 108 by a removable memory interface, such as interface 128 or130.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 7, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 102. In FIG. 7, for example, hard disk drive 116 is illustratedas storing operating system 132, application programs 134, other programmodules 136, and program data 138. A user may enter commands andinformation into the computer 102 through input devices such as akeyboard 140 and a mouse, trackball or touch pad 142. These and otherinput devices are often connected to the processing unit 104 through auser input interface 144 that is coupled to the system bus, but may beconnected by other interface and bus structures, such as a parallel portor a universal serial bus (USB). A monitor 158 or other type of displaydevice is also connected to the system bus 108 via an interface, such asa video interface or graphics display interface 156. In addition to themonitor 158, computers can also include other peripheral output devicessuch as speakers (not shown) and printer (not shown), which can beconnected through an output peripheral interface (not shown).

The computer 102 can be integrated into an analysis system, such as amicroarray or other probe system described herein. Alternatively, thedata generated by an analysis system can be imported into the computersystem using various means known in the art.

The computer 102 can operate in a networked environment using logicalconnections to one or more remote computers or analysis systems. Theremote computer can be a personal computer, a server, a router, anetwork PC, a peer device or other common network node, and typicallyincludes many or all of the elements described above relative to thecomputer 102. The logical connections depicted in FIG. 7 include a localarea network (LAN) 148 and a wide area network (WAN) 150, but can alsoinclude other networks. Such networking environments are commonplace inoffices, enterprise-wide computer networks, intranets and the Internet.When used in a LAN networking environment, the computer 102 is connectedto the LAN 148 through a network interface or adapter 152. When used ina WAN networking environment, the computer 102 typically includes amodem 154 or other means for establishing communications over the WAN150, such as the Internet. The modem 154, which can be internal orexternal, can be connected to the system bus 108 via the user inputinterface 144, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 102, orportions thereof, can be stored in the remote memory storage device.

In further aspects of the present application, computer-implementedmethods are provided for analyzing the presence or quantity of over 20,50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 30,000, 40,00050,000 or 60,000 different OTUs in a single assay. In some embodiments,computer executable logic is provided for determining the presence orquantity of one or more microorganisms in a sample comprising: logic foranalyzing intensities from a set of probes that selectively binds eachof at least 20, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000,30,000, 40,000 50,000 or 60,000 unique and highly conservedpolynucleotides and determining the presence of at least 97% of allspecies present in said sample with at least 90%, 95%, 96%, 97%, 98%,99% or 99.5% confidence level.

In some embodiments, computer executable logic is provided fordetermining probability that one or more organisms, from a set ofdifferent organisms, are present in a sample. The computer logiccomprises processes or instructions for determining the likelihood thatindividual interrogation probe intensities are accurate based oncomparison with intensities of negative control probes and positivecontrol probes; a process or instructions for determining likelihoodthat an individual OTU is present based on intensities of interrogationprobes from OTUs that pass a first quantile threshold; and a process orinstructions for penalizing one or more OTUs that have passed the firstquantile threshold based on their potential for cross-hybridizing withother probes that have also passed the first quantile threshold.

In some embodiments, computer executable logic is provided fordetermining the presence of one or more microorganisms in a sample. Thelogic allows for the analysis of a set of at least 1000 differentinterrogation perfect probes. The logic further provides for thediscarding of information from at least 10% of the interrogation perfectmatch probes in the process of making the determination. In someembodiments, the computer executable logic is stored on computerreadable media and represents a computer software product.

In some embodiments, computer software products are provided whereincomputer executable logic embodying aspects of the invention is storedon computer media like hard drives or optical drives. In someembodiments, the computer software products comprise instructions thatwhen executed perform the methods described herein for determiningcandidate probes.

In some further embodiments, computer systems are provided that canperform the methods of the inventions. In some embodiments, the computersystem is integrated into and is part of an analysis system, like a flowcytometer or a microarray imaging device. In other embodiments, thecomputer system is connected to or ported to an analysis system. In someembodiments, the computer system is connected to an analysis system by anetwork connection. In one example of a system that employs a networkconnection, a sample is imaged using a commercially available imagingsystem and software. The data is outputted using a standard data formatlike a CEL file (AFFYMETRIX®), or a Feature Report file (NIMBLEGEN®).Then the data is sent to a remote or central location for analysis usinga method disclosed herein. In some embodiments, a standardized analysisis performed providing signal normalization, OTU quantification, andvisual analytics. In other embodiments, a customized analysis isperformed using a fixed protocol designed for the user's particularneeds. In still other embodiments, a user configurable analysis is used,include a protocol that allows for the user to adjust at least onevariable before each analysis run.

After processing, the results are stored in an exchangeable binaryformat for later use or sharing. Additionally, hybridization scores andOTU probability values can be exported to a tab delimited file or in aformat compatible with UniFrac (Lozupone, et al., UniFrac—an online toolfor comparing microbial community diversity in a phylogenetic context,BMC Bioinformatics, 7, 371; 2006) for further statistical analysis ofthe detected sample communities.

In some embodiments, multiple, interactive views of the data areavailable, including taxonomic trees, heatmaps, hierarchical clustering,parallel coordinates (time series), bar plots, and multidimensionalscaling scatterplots. In some embodiments, the taxonomy tree displaysthe mean intensities for each detected OTU and displays the leaves ofthe tree as a heatmap of samples. The tree can be dynamically pruned byfiltering OTUs below a certain intensity or probability threshold.Additionally, the tree can be summarized at any level from phylum tosubfamily. In other embodiments, the user can hierarchically clusterboth OTUs and samples using any of the standard distance and linkagemethods from the integrated C Clustering Library (de Hoon, et al., Opensource clustering software, Bioinformatics, 20, 1453-1454; 2004), andthe resulting dendrograms displayed in a secondary heatmap window. Insome embodiments, a third window is provided that displays interactivebar plots of differential OTU intensities to facilitate pairwisecomparison of samples. For any two samples, the height of the differencebars displays either the absolute or relative difference in meanintensity between OTUs. The bars can be grouped and sorted along thehorizontal axis by any taxonomic rank for easy identification andcomparison. Synchronized selection and filtering affords users theunique ability to seamlessly navigate between multiple views of thedata. For example, users can select a cluster in the hierarchicalclustering window and simultaneously view the selected organisms in thetaxonomy tree, immediately revealing both their phylogenetic andenvironmental relationship. In further embodiments, the data from theanalysis system, i.e., analysis system or flow cytometer, can beco-analyzed and displayed with high-throughput sequencing data. In someembodiments, for each organism identified as present in the sample, theuser is able to view a list of other environments where the particularorganism is found.

In some embodiments, the screen displays are dynamic and synchronized toallow the selection or filtration of OTUs with changes to any viewsimultaneously reflected in all other views. Additionally, OTUsconfirmed by 16S rRNA gene, 18S rRNA gene, or 23S rRNA gene sequencingcan be co-displayed in all views.

Business Methods

In some aspects of the present application, a business method isprovided wherein a client images an array or scans a lot ofmicroparticles and sends a file containing the data to a serviceprovider for analysis. The service provider analyzes the data andprovides a report to the user in return for financial compensation. Insome embodiments, the user has access to the service provider's analysissystem and can manipulate and adjust the analysis parameters or thedisplay of the results.

In another aspect of the present application, a business method isprovided wherein a client sends a sample to be processed, imaged orscanned and the data analyzed for the presence or quantity of organisms.The service provider sends a report to the client in return forfinancial compensation. In some embodiments, the client has access to asuite of data analysis and display programs for the further analysis andviewing of the data. In further embodiments, the service provider firstprovides a system or kit to the client. The kit can include a system toassay a majority, or the entirety of the microbiome present or thesystem can contain “down-selected” probes designed for particularapplications. After sample processing and imaging, the client sends thedata for analysis by the service provider. In some embodiments, theclient report is electronic. In other embodiments, the client isprovided access to a suite of data analysis and display programs for thefurther viewing, manipulation, comparison and analysis of the data. Insome embodiments, the client is provided access to a proprietarydatabase in which to compare results. In other embodiments, the clientis provided access to one or more public databases, or a combination ofprivate and public database for the comparison of results. In someembodiments, the proprietary database includes the pooled results(fingerprints, biosignatures) for normal samples or the pooled resultsfrom particular abnormal situations such as a disease state. In someembodiments, the biosignatures are continuously and automaticallyupdated upon receipt of a new sample analysis.

In some embodiments, the database further comprises highly conservedsequence listings. In some embodiments, the database is updatedautomatically as new sequence information becomes available, forinstance, from the National Institutes of Health's Human MicrobiomeProject. In further embodiments, probe sets are automatically updatedbased on the new sequence information. Continuous upgrading of thesequence information and refinement of the probe sets allow forincreasing accuracy and resolution in determining the composition ofmicrobiomes and the quantity of their individual constituents. In someembodiments, the system compares earlier microbiome biosignatures withlater microbiome biosignatures from the same or substantially similarenvironments and analyzes the changes in probe set composition andhybridization signal analysis parameters for information that is usefulin improving or refining the discrimination between related OTUs,identification and quantification of microbiome constituents, orincreasing accuracy of the determinations.

In some embodiments, the database compiles information about specificmicrobiomes, for example, the microbiota associated with healthy andunhealthy human intestinal microflora including, age, gender and generalhealth status of host, geographical location of host, host's diet (i.e.,Western, Asian or vegetarian), water source, host's occupation or socialstatus, host's housing status.

In some embodiments, the reference healthy/normal signatures for adults,male and female, and children can be used as benchmarks to identifypresymptomatic and symptomatic disease states, response totreatments/therapies, infection, and/or secondary infection associatedwith disease. In some embodiments, the client is provided with adiagnosis or treatment recommendation based on the comparison betweenthe client's sample microbiome and one or more reference microbiome.

In some embodiments, the present application contemplates using 454pyrosequencing for detection of OTUs in a sample. The OTUs are detectedusing barcoded amplicons. The amplification can occur on beads. 454sequencing protocols are described in, for example, (Poinar H N, et. alScience. 2006 Jan. 20; 311(5759) pages 392-4) which is incorporatedherein by reference.

EXAMPLES

The following examples are given for the purpose of illustrating variousembodiments of the invention and are not meant to limit the presentinvention in any fashion. The present examples, along with the methodsdescribed herein are presently representative of preferred embodiments,are exemplary, and are not intended as limitations on the scope of theinvention. Changes therein and other uses which are encompassed withinthe spirit of the invention as defined by the scope of the claims willoccur to those skilled in the art.

Example 1 Exclusion of Dead Cell DNA from Detection

FIG. 1 illustrates one non-limiting embodiment of the presentapplication. To briefly summarize, a test sample suspected of containinga mixture of vegetative cells, live, and dead bacterial endospores isfirst heat shocked at 80° C. for 15 minutes (Step 1). This step willeliminate any vegetative cells present in the sample, whereas sporeswill remain unaffected. The sample is then treated with PMA followed byincubation in the dark. Thereafter, the sample is exposed to visiblelight for 5 minutes, so that the DNA from dead cells only will becross-linked. In general, excess PMA is removed from a sample after thetreatment step and prior to or as part of subsequent nucleic acidextractions steps, such as by centrifugation and washing. Following theinitial PMA treatment, the sample is concentrated by centrifugation andthe fluorescence of the resulting reaction mixture, which contains deadcells and spores, is measured. The sample is then divided into two. Thefirst aliquot is heated to 100° C. for 18 minutes (Step 2a) to kill allmicrobes, including endospores, and the fluorescence of the reactionmixture is measured as mentioned before. The Step 2a fluorescencemeasured is subtracted from Step 1 fluorescence to determine totalviable spores in the test sample. In Step 2b, qPCR is used to provideconfirmatory quantitation of the total viable spores by detecting DNAconcentration, since the PMA intercalated DNA from dead cells/sporeswould be unavailable for PCR amplification. In some embodiments, thetotal detection time, including PCR reaction for low-biomass samples (1to 10³ spores per reaction), will be within 60 minutes.

A simplified work flow of the PMA-Microarray approach is shown in FIG.2. To summarize, appropriate samples are taken according to establishedsampling protocols. Each sample is then divided into three portions andprocessed separately. The first sample is treated with PMA, lightinactivated, and DNA is extracted using the Maxwell 16 automated system(Promega Corp.). In the second sample, DNA is extracted using anautomated DNA extracting system.

In some embodiments, DNA from both the first and second samples is thenused for 16S rRNA gradient PCR, the product of which is further used forPhyloChip analysis; the 16S rRNA qPCR is used to further measure thetotal microbial concentration with and without PMA treatment; and thethird and last sample is used for microscopy analysis. The BacLightstain (Invitrogen Crop.) is used to distinguish between viable andnonviable cells present in a sample; samples that are stained green areviable and samples that are stained red are nonviable. In someembodiments, these results will then be used to correlate with thePhyloChip, and qPCR results.

In some embodiments, the methods are used for the detection andidentification of only viable microbes in clean rooms or satellites,such as for the purposes of planetary protection.

Example 2 PMA Method

An objective was to evaluate a rapid and sensitive spore detectionconcept that will estimate viable microbial spores.

Real-time quantitative-PCR methods can be used to estimate microbialpopulations. This unique method is based on the use of a fluorescent DNAintercalating agent propidium monoazide (PMA), which can only penetratethe membranes of dead cells. The combination approach described in thisexample is referred to as the “PMA-method.” The present exampledescribes the development of a reliable and efficient method forextracting DNA from spores. The spore structure is complex and made upof several layers that impart resistance to extreme environmentalconditions. These several layers which include the cortex and coatstructures together are probably responsible for difficulties inreleasing DNA from spores. Using a urea-based extraction buffer, it waspossible to degrade the outer spore coats. With the spore coats removedthe spores were more susceptible to lytic degradation. After lyticdegradation, DNA extraction of both viable and dead spores wasconsistent in yield and amplification efficiency. Several alternativeDNA extraction methods were tried including the method by Rawsthorne etal (Rawsthorne and Phister 2009, Letters in Applied Microbiology, 49:652-654), which utilized DTT, and were found to be less efficient.

DNA intercalating agents have been used to selectively distinguishbetween viable and dead bacterial cells, as they can only penetrate themembrane of dead cells (Rudi et al. 2005, Applied and EnvironmentalMicrobiology, 71(2): 1018-1024; Nogva et al. 2003, BioTechniques, 34:804-813). Once the DNA of dead cells is crosslinked, it is unavailablefor PCR amplification (Nocker et al. 2007, Applied and EnvironmentalMicrobiology, 73(16): 5111-5117). Recently, Rawsthorne et al (2009)demonstrated that PMA can be used to selectively distinguish betweenviable and non-viable B. subtilis spores.

DNA isolated from E. coli was treated with PMA to show that PMA can beused to intercalate DNA and make it unavailable for PCR amplification(FIG. 3).

Using the method of Rawsthorne et al (2009), B. pumilus SAFR-032 sporeswere heat inactivated at 121° C. for 15 min in two consecutive cycles.DNA was extracted using both the urea and DTT based methods. Whencompared to viable spores, DNA after one and two consecutive autoclavecycles was significantly degraded, and remaining DNA was barelydetectable by qPCR. This means that PMA pre-treatment of thermallyinactivated B. pumilus spores may not be necessary when using rigoroustime/temperature combinations.

Alternative killing treatments including a lower heating temperature(90° C.) and the use of UV were utilized to see how the DNA from B.pumilus is affected and if PMA treatment is necessary. FIGS. 5 and 6show plate count compared to qPCR results. Based on the qPCR results, ifthe alternative treatments are used, spores will have to be treated withPMA as the DNA is not degraded.

A comprehensive census of the viable microbes on the surfaces ofspacecraft can be used to maximize detection of resident microbiota. Forexample, PMA-based quantitation can be used to elucidate the viablemicrobial community that could survive to reach other planets.

Example 3 Evaluation of PMA-Treated Sample by Phylogenetic Array

A phylogenetic array is fabricated with some of the organism-specificand OTU-specific 16s rRNA probes selected by the methods describedherein and in co-pending international application PCT/US2010/040106.The phylogenetic array in this example consists of 1,016,064 probefeatures, arranged as a grid of 1,008 rows and columns. Of thesefeatures, −90% are oligonucleotide perfect match (PM) or mis-match (MM)probes with exact or inexact complementarity, respectively, to 16s rRNAgenes. In general, MM probes have one or more sequence differences withrespect to PM probes, resulting in one or more mismatches with a targetsequence. Each PM probe is paired with a mismatch control probe todistinguish target-specific hybridization from background and non-targetcross-hybridization. The remaining probes are used for imageorientation, normalization controls, or for pathogen-specific signatureamplicon detection using additional targeted regions of a chromosome.Each high-density 16s rRNA gene microarray is designed with additionalprobes that (1) target amplicons of prokaryotic metabolic genes spikedinto the 16s rRNA gene amplicon mix in defined quantities just prior tofragmentation and (2) are complementary to pre-labelled oligonucleotidesadded into the hybridization mix. The first control collectively teststhe target fragmentation, labeling by biotinylation, arrayhybridization, and staining/scanning efficiency. It also allows theoverall fluorescent intensity to be normalized across all the arrays inan experiment. The second control directly assays the hybridization,staining and scanning.

A microbial sample is treated with PMA followed by incubation in thedark. Thereafter, the sample is exposed to visible light for 5 minutes,so that the DNA from dead cells only will be cross-linked. Following theinitial PMA treatment, the sample is concentrated by centrifugation, andDNA is extracted using the Maxwell 16 automated system (Promega Corp.).Extracted DNA is PCR amplified, and DNA is hybridized to thephylogenetic array to detect and optionally quantify OTUs represented byviable cells and/or spores from the sample.

Complementary targets to the probe sequences hybridize to the array andfluorescent signals are captured as pixel images using standardAFFYMETRIX® software (GeneChip Microarray Analysis Suite, version 5.1)that reduce the data to an individual signal value for each probe and istypically exported as a human readable CEL′ file. Background probes areidentified from the CEL file as those producing intensities in thelowest 2% of all intensities. The average intensity of the backgroundprobes is subtracted from the fluorescence intensity of all probes. Thenoise value (N) is the variation in pixel intensity signals observed bythe scanner as it reads the array surface. The standard deviation of thepixel intensities within each of the identified background probeintensities is divided by the square root of the number of pixelscomprising that feature. The average of the resulting quotients was usedfor N in the calculations described below.

Using previous methods, probe pairs scored as positive are those thatmeet two criteria: (i) the fluorescence intensity from the perfectlymatched probe is at least 1.3 times greater than the intensity from themismatched control probe, and (ii) the difference in intensity, PM minusMM, is at least 130 times greater than the squared noise value (>130N2). The positive fraction (PosFrac) is calculated for each probe set asthe number of positive probe pairs divided by the total number of probepairs in a probe set. An OTU is considered ‘present’ when its PosFracfor the corresponding probe set was >0.92 (based on empirical data fromclone library analyses). Replicate arrays can be used collectively indetermining the presence of each OTU by requiring each to exceed aPosFrac threshold. Present calls are propagated upwards through thetaxonomic hierarchy by considering any node (subfamily, family, order,etc.) as ‘present’ if at least one of its subordinate OTUs is present.

Hybridization intensity us the measure of OTU abundance and iscalculated in arbitrary units for each probe set as the trimmed average(maximum and minimum values removed before averaging) of the PM minus MMintensity differences across the probe pairs in a given probe set. Allintensities<1 are shifted to 1 to avoid errors in subsequent logarithmictransformations.

Example 4 Phylogenetic Array Analysis

Following sample preparation, application, incubation and washing, usingstandard techniques, phylogenetic arrays were scanned using a GeneArrayScanner from Affymetrix. The scan was captured as a pixel image usingstandard AFFYMETRIX® software (GCOS v1.6 using parameter: Percentilev6.0) that reduces the data to an individual row in a text-encoded tablefor each probe. See Table 3.

TABLE 3 Exemplary Display of Array Data [INTENSITY] NumberCells = 506944CellHeader = XY NPIXELS MEAN STDV 047.9 025 167.0 11060.2 025 4293.0243.7 036 179.3 3681.5 025 4437.0

Each analysis system had approximately 1,016,000 cells, with 1 probesequence per cell. The analysis system scanner recorded the signalintensity across the array, which ranges from 0 to 65,000 arbitraryunits (a.u) in a regular grid with −30-45 pixels per cell. A 2 pixelmargin was used between adjacent cells, leaving approximately 25-40pixels per probe of usable signal. From these pixels, the AFFYMETRIX®software computed the 75th percentile average pixel intensity (denotedas the “MEAN”), the standard deviation of signal intensity among theabout 25-40 pixels (denoted as the “STDV”), and the number of pixelsused per cell (denoted as “NPIXELS”). Any cells that had pixels thatwere three standard deviations apart in signal intensity were classifiedas outliers.

The analysis systems were divided into a user-defined number ofhorizontal and vertical divisions. By default, four horizontal and fourvertical divisions were created resulting in 16 regularly spaced sectorsfor independent background subtraction. The background intensity wascomputed independently for each quadrant, as the average signalintensity of the least intense 2% (by default) of probes in thatquadrant. The background intensity was then subtracted from all probesbefore further computation.

The noise value was estimated according to recommendations in theAFFYMETRIX® GeneChip User Guide v3.3. Noise (N) was due to variations inpixel intensity signals observed by the scanner as it read the arraysurface and was calculated as the standard deviation of the pixelintensities within each of the identified background cells divided bythe square root of the number of pixels comprising that cell. Theaverage of the resulting quotients was used for N in the calculationsdescribed below:

$N = \frac{{\sum\limits_{i \in B}\frac{Si}{\sqrt{{pix}_{i}}}}\;}{scalarB}$

where

B is a background cell

S_(i) is the standard deviation among the pixels in B

pix_(i) is the count of pixels in B

scalarB is the count of all background cells, cumulative

The intensities of all probes were then scaled so that the averageobserved signal intensity of the spiked in probes had a pre-determinedsignal strength. This was accomplished by finding a scaling factor (Sf)in order to force the mean response of the corresponding PM probes to atarget mean using the equation below:

${Sf} = {{\overset{\_}{e}}_{t}/\frac{\sum\limits_{i \in {Kpm}}^{\;}\; e_{i}}{{scalarK}_{pm}}}$

where

ē_(t)=targeted mean intensity (default: 2500)

scalarK_(pm)=count of probes complementing any spike-in

S_(f)=scaling factor

Typically, the pre-determined signal strengths ranged from about 0 toabout 65,000. Once the scaling factor was derived, all cell intensitieswere multiplied by the scaling factor.

The noise (N) was scaled by the same factor: N_(s)=N×S_(f); whereN_(s)=scaled noise, N=unscaled noise, and S_(f)=scaling factor.

As an alternative or optional step, MM probes with high hybridizationsignal responses were identified and the probe pairs were eliminatedwhere:

$\lbrack {( {\frac{MM}{PM} > {srt}_{\gamma}} )\bigwedge( {{{MM} - {PM}} > {N_{s} \times {sdtm}_{\gamma}}} )} \rbrack\bigvee\lbrack {{PM} \in O} \rbrack\bigvee\lbrack {{MM} \in O} \rbrack$

where:

N_(s)=scaled noise

O=outlier set

The remaining probe pairs were scored by:

$( {\frac{PM}{MM} > {srt}} )\bigwedge( {{{PM} - {MM}} > {N_{s}^{2} \times {sdtm}}} )$

where

PM=sealed intensity of the perfect match probe

MM=scaled intensity of the perfect match probe

srt=standard ratio threshold (default: 1.3)

sdtm=standard difference threshold multiplier (default: 130)

N_(s)=scaled noise

After classifying an OTU as “present”, the present call was propagatedupwards through the taxonomic hierarchy by considering any node(subfamily, family, order, etc.) as ‘present’ if at least one of itssubordinate OTUs was present.

Hybridization intensity was the measure of OTU abundance and wascalculated in arbitrary units for each probe set as the trimmed average(maximum and minimum values removed before averaging) of the PM minus MMintensity differences across the probe pairs in a given probe set.

While preferred embodiments of the present application have been shownand described herein, it will be obvious to those skilled in the artthat such embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

Example 5 Clean Room Sampling, PMA Processing and Phylogenetic ArrayAnalysis

The two Class 100K certified as per Fed-Std-209E (equivalent ISO 14644-1Class 8) cleanroom facilities were examined. Mission critical hardwareand components were assembled in the first cleanroom site, JetPropulsion Laboratory's Spacecraft Assembly Facility [JPL-SAF], whereasthe second cleanroom site, JPL cleanroom 144, did not support missioncritical activities but maintained its Class 100K certification. Both ofthese cleanroom facilities were operated at a positive pressure, withtemperatures in the range of 20+4° C., and relative humidity rangingfrom 30 to 50%. In both cases, the total hydrocarbon content of thefacility air (gases and vapors) was below 15 ppm (calculated asmethane). The Class 100K environment typically did not exceed 10Kparticulate obstruction levels during 30 days of assembly/integrationactivities. Measured non-volatile residue levels did not exceed 0.3μg/cm².

For convenience and differentiation, 454 pyrosequencing-based sequencediscrimination results in the clustering in what are referred to as“molecular operational taxonomic unit (MOTU).” Variation in DNA sequenceamong microbes can arise via naturally occurring evolutionary eventsand/or methodological errors (e.g., homopolymer repetition inpyrosequencing). It is the goal of the MOTU-based classification andclustering system presented here, and in detail elsewhere (Blaxter &Floyd, TRENDS in Ecology and Evolution 18: 268-269, 2003), to separatethese two sources of sequence variation based on known error rates insequencing and measured levels of difference across various taxonomicalschemes. The accuracy and specificity of a MOTU-based system can bederived from measured levels of between-taxa and within-group variationfrom well-defined populations, and of observational error obtained byre-sequencing (Blaxter & Floyd, 2003). In a similar vein, PhyloChip DNAmicroarrays employ multiple ˜25 nt probes which collectively representthe full length ˜1.5-kb 16S rRNA gene of each taxon. In this study,PhyloChip-derived taxonomic units (OTU) are delineated in accordancewith the hybridization scores of a given set of 37 or more specific25-mer probes, which have been previously designed based on theprevalence of members of a given OTU, and dissimilarity in DNA sequencesoutside of the given OTU. Ultimately, a microorganism can be assigned toonly one given MOTU via either similarity within a homologous sequencedDNA fragment or hybridization-based OTU, but neither MOTU nor OTU needbe congruent with other taxonomic schemes.

PhyloChip G3 (DNA Microarray, Generation 3) Analysis:

Bacterial 16S rRNA genes were amplified from DNA preparations from eachsample using the primers 27F (5′-AGA GTT TGA TCC TGG CTC AG) and 1492R(GGT TAC CTT GTT ACG ACT T). PCR conditions were as follows: 1 cycle ofinitial melting for 3 min at 95° C., followed by 35 cycles of 30 secmelting at 95° C., 30 sec annealing over a 48-57.5° C. gradient, and 2min extension at 72° C., with a final 10 min incubation at 72° C. Tomaximize observed diversity, four separate PCR reactions were performedfor each sample using a gradient of annealing temperatures (48° C.,50.1° C., 54.4° C., and 57.5° C.). Archaeal amplification of 16S rRNAgenes was performed with primers 4aF and 1492uR, and resulting ampliconswere purified via gel-excision if/when quantifiable amounts of producthad been generated. A maximum of 450 ng of PCR product from each samplewas used for phylogenetic microarray analysis. A detailed explanation ofthe processing of the PhyloChip assay has been described elsewhere(Hazen, et al., Science 330: 204-208, 2010). Briefly, pooled PCRproducts from each sampling event were spiked with known amounts ofsynthetic control 16S rRNA gene fragments and non-16S rRNA genefragments. Fluorescence intensities arising from these controls wereused as standards for normalization among samples. Target fragmentation,biotin labeling, PhyloChip hybridization, scanning and staining, as wellas background subtraction, noise calculation and detection, andquantification criteria, were performed as previously reported(DeSantis, et al., Microb. Ecol. 53: 371-383, 2007; Hazen, et al.,2010). OTUs were deemed present if quartiles of the ranked r scores(response score to determine the potential of a probe pair responding toa target and not to the background) met the following criteria:rQ1≧0.70, rQ2≧0.95, rQ3≧0.98 (stage 1 analysis). Subfamilies that had anrxQ3 value (cross-hybridization adjusted response score) of ≧0.6 weredeemed present following an additional stage 2 analysis requirement forOTU calling within this subfamily.

Phylochip G3 Data Processing:

Following stage 2 analysis, hybridization intensities were log 2*1000transformed for linearization. Subfamily filtering was performedmanually by choosing representative OTUs of each subfamily depending onhow often a OTU was called present throughout the samples. OTUs andsubfamilies were subsequently clustered and classified in accordancewith the recently released Greengenes taxonomy (McDonald, et al., ISMEJ, 2011). In order to better understand the phylogenetic relationshipsof detected OTUs/subfamilies, multiple sequence alignments ofrepresentative sequences were retrieved from the SILVA database(Pruesse, et al., Nucleic Acids Res. 35: 7188-7196, 2007). Neighborjoining phylogenetic trees were compiled based on the 70,000 characteralignments of each OTU via MEGA4 software (Tamura, et al., Mol. Biol.Evol. 24: 1596-1599, 2007). Subsequently, trees were mated withcorresponding heatmaps (either presence/absence or relative abundance)and rendered in iTOL (Letunic & Bork, Bioinformatics 23: 127-128, 2007).Environmental clustering (Non-metric Multi-Dimensional Scaling, N-MDS)based on abundance scores and Bray Curtis distance was performed usingthe R programming environment (www.r-project.org, Vegan and MASSpackage). Weighted principal component analysis was performed using theFAST Unifrac interface, where OTUs were grouped into subfamilies and thenumber of distinct OTUs per subfamily served as weighting (Hamady, etal., ISME J 4: 17-27, 2010). Pearson correlation of abundance valuesbetween observed OTUs and various environmental factors (pH,temperature) were generated with Microsoft Excel (2007). Correlationswith r>|.877| for comparison among three samples and r>|.811| forcomparison among four samples were considered statistically significant(null hypothesis: OTUs do not correlate with the environmental factortested). Graphs were constructed with the SigmaPlot 10.0 software suite.

PhyloChip G3 DNA microarray analyses and associated biostatisticalprocessing were carried out as described recently (Hazen et al.,“Deep-sea oil plume enriches indigenous oil-degrading bacteria,” Science330(6001):204-208 (2010); Cooper et al., “Comparison of innovativemolecular approaches and standard spore assays for assessment of surfacecleanliness.” Appl. Environ. Microbiol., 77(15):5438-5444 (2011), thecontents of which are hereby expressly incorporated by reference). OTUsthat were called present in negative control samples (e.g., samplingblanks, handling controls) were removed from the entire data set.Non-metric multi dimensional scaling was performed on abundance valuesof resulting OTUs using the R programming environment (Vegan package,http://www.r-project.org/). Phylogenetic classification of OTUs wasperformed using GreenGenes (DeSantis, et al., Appl Environ Microbiol 72:5069-5072, 2006) in combination with SILVA (Pruesse, et al., 2007) andRDP II (Cole, et al., 2005). OTUs that represents majority of the livingbacterial population were identified based on two criteria: A) the OTUshould be present based on presence/absence call by the phyloChipanalysis B) OUT should have a higher abundance value in PMA treatedsample as compared to non-PMA treated sample. The total OTUs (1277)detected were grouped into 126 OTUs at genus level by selecting onlythose which exhibit higher abundance in PMA treated samples.Phylogenetic trees were constructed in MEGA 4 (Tamura, et al., 2007)based on an 70,000 character alignment (SILVA, (Pruesse, et al., 2007)and the neighbour joining method. Heat maps of abundance ratios andtrees were rendered in iTOL (Letunic & Bork, 2007).

454 Pyrosequencing and Data Analysis:

The 16S universal bacterial primers 28F-5′ GAG TTT GAT CNT GGC TCA G 3′(SEQ ID NO: 101) and 519R 5′-GTN TTA CNG CGG CKG CTG-3′ (SEQ ID NO:102)were used to PCR amplify the ˜500 bp V1 to V3 hypervariable region ofthe 16S rRNA gene. Bacterial tag-encoded FLX amplicon pyrosequencing(bTEFAP) was performed as described previously (Dowd, et al., 2008). Inpreparation for FLX-Titanium sequencing (Roche, Nutley, N.J.), templateDNA fragment size and concentration were accurately measured with aTBS-380 Fluorometer using picogreen fluorescent dyes (Turner Biosystems,CA, USA). Total volume of PCR amplicon used for emulsion PCR was 2 uLfor strong positive (>15 pg/uL); 5 uL for weak positive (5 and 15pg/uL), and 20 uL for negative (<5 pg/uL) PCR products from individualsamples. This normalization step helped to minimize bias in resultingsequence and/or tag frequency in favor of strong PCR products. A sampleof 9.6×106 double-stranded DNA molecules/ml (mean=625 bp) were combinedwith 9.6×106 DNA capture beads, and then amplified by emulsion PCR.Following bead recovery and enrichment, the bead attached DNAs weredenatured with NaOH, and sequencing primers were annealed. A four-region454 sequencing run was performed on a GS PicoTiterPlate (PTP) using theGenome Sequencer FLX System (Roche, Nutley, N.J.). A maximum of 40distinctly barcoded samples were analyzed on each quarter region of thePTP. All FLX procedures were performed according to the Genome SequencerFLX System's manufacturer's instructions (Roche, Nutley, N.J.). AllbTEFAP procedures were performed at the Research and Testing Laboratory(RTL; Lubbock, Tex.) in accordance with well established protocols(www.researchandtesting.com).

bTEFAP-Derived Bacterial Diversity and Data Analysis:

All resulting 454 pyrosequences were processed and analyzed using theMOTHUR software package (Schloss, et al., Appl Environ Microbiol 75:7537-7541, 2009). The AmpliconNoise algorithm was implemented to reducesequencing error. Sequences were removed from the analysis if they i)did not contain the primer and/or barcode sequence, ii) had a totalsequence length<200 bp, or iii) had a quality score<25. The filteredsequences were assigned to individual samples according to theirpreviously engineered 12-nt barcodes. Unique sequences were alignedusing the SILVA reference alignment (Schloss, 2009), and only theoverlapping region was considered for analyses. After subtracting allchimeras from consideration, remaining high-quality sequences wereclustered and classified according to the recently released Greengenestraining set and taxonomy (McDonald, et al., 2011, Werner, et al.,2012). Sequences were clustered into molecular operational taxonomicunits (MOTU) at the 0.03 level of resolution.

Controls:

Very low amounts of PCR products were obtained for BisKit blank,negative control (<2 pg/uL). This indicates that DNA associated withsampling devices and reagents used in this study is in extremely lowconcentration and extraneous nature rather than from intact bacterialcells. This extraneous DNA is not available for PCR reaction after PMAtreatment hence yields further reduced PCR products.

Lab controls and field blanks were also treated with PMA to assessrelative percent viability of contaminant microbiota. For all of thesecleanroom control and blank samples rrn copy numbers remained at 10²irrespective of whether they had been subjected to PMA treatment. Thesignals detected in these control samples, implying a presence of <100viable contaminant organisms, were present at insignificant levels inrelation to those arising from the actual samples of interest, and werethus ignored

Results:

Total bacterial burden as assessed by universal qPCR, and the variouscharacteristics of the samples collected during this study weremeasured. When PMA was omitted from the molecular processing, theqPCR-derived total bacterial burden (viable+non-viable) of the floorsand GSE samples collected in the mission-critical JPL-SAF was ˜10⁶ rrncopies/m². After treatment with PMA, the viable bacterial population ofJPL-SAF samples as measured via PMA-qPCR was ˜7% and 41% for floors andGSE, respectively. Samples collected from the floor and GSE of thenon-mission critical JPL-144 facility yielded much higher rrn copynumbers (more than one-log) both with and without PMA-treatment.However, the percent viable portion of the total bacterial populationpresent in the JPL-144 samples was ˜21% and 7% for floors and GSEsamples, respectively. Upon comparative analysis, JPL-144 floors houseda ˜66.7-fold greater total bacterial burden than their mission-criticalJPL-SAF counterparts. Similarly, JPL-144 GSE samples were ˜15% moreladen with total bacteria (viable+non-viable) than the JPL-SAF GSEmaterials.

454 Pyrosequencing:

The concentration of 500-bp amplicons resulting from pre-454pyrosequencing PCR was <3.75 pg/uL for all samples treated with PMA,and >5 pg/uL in all samples when PMA treatment was omitted. A verystrong 17 pg/uL PCR product resulted from one untreated JPL-144 floorsample (GI-42-1), which was 2.3-fold higher than that arising from itscorresponding JPL-SAF floor sample (GI-36-4). The untreated JPL-144 GSEmaterial (GI-42-2) exhibited 1.6-fold greater PCR ampliconconcentrations than their untreated mission-critical JPL-SAF GSEcounterparts (GI-36-3). After having been pretreated with PMA, even themission critical SAC floor sample yielded fewer PCR products (5.4 pg/uL)than the GSE material housed in the JPL-144 cleanroom (8.9 pg/uL). Ingeneral, irrespective of the type of sample analyzed, a considerabledecrease in PCR yield was observed for all samples pretreated with PMA.

A breakdown of the number of MOTUs observed in the various samples wasexamined over the course of this study. Overall, the JPL-144 GSE sampleshoused 4.7-fold more bacterial MOTUs than the mission-critical GSE,whereas the cleanroom floor surfaces from each of these cleanroomsyielded a roughly equivalent number of MOTUs. The mission criticalJPL-SAF floor was rich in MOTUs affiliated with physiologicallyrecalcitrant bacteria (Actinobacteria, Deinococci, Acidobacteria, andFirmicutes), whereas the JPL-144 cleanroom floor harbored predominantlyProteobacterial MOTUs. It was particularly apparent that the fewacidobacterial MOTUs (4 MOTUs) in JPL-SAF floor samples were present ingreat abundance (112 sequences). However, the GSE samples exhibited nosuch corollaries between MOTU numbers and sequence occurrence.

Anywhere from one to 528 high-quality pyrosequences (>250-bp) wereobtained from samples that had been pretreated with PMA, whereas 1,783to 15,914 high-quality pyrosequences were recovered from untreatedsamples. Regardless of sample type, PMA treated samples consistentlyyielded significantly fewer pyrosequences that their untreatedcounterparts. Even when PMA treatment was omitted, the mission criticalJPL-SAF cleanroom floor sample (GI-36-4) yielded far fewer pyrosequences(4,318 pyrosequences) than the non-mission JPL-144 cleanroom floorsample (GI-42-1; 15,914 pyrosequences). Approximately 59% of all of thepyrosequences retrieved from these two distinct facility floor sampleswere present in both, though this shared fraction represented only 7.9%of the total observed MOTUs (69 out of 872). The relative abundance ofpyrosequences retrieved from the untreated JPL-SAF samples was plottedas a Venn diagram. Even though equivalent surface areas were sampledfrom the floor and GSE of the JPL-SAF (18 m2 each), only 6,101pyrosequences were retrieved from the GSE samples whereas the floor gaverise to 20,232 sequences. Between the JPL-SAF floor and GSE samples,˜38% of the total pyrosequences observed were shared, but constituted amere ˜7.5% of the total MOTUs (43 out of 574).

The MOTUs shared between two distinct sample sets, for example floors ofJPL-SAF vs. JPL-144 or JPL-SAF floors vs. JPL-SAF GSE, werecomparatively analyzed. Whenever the difference in the number ofrepresentative sequences per MOTU was less than 10, the comparisonbetween MOTU sets was deemed moot and thus omitted from consideration.Following this subtraction, the cleanroom floor comparative sample setretained ˜92% of the total pyrosequences observed, however, 87.5% of thetotal MOTUs (711 out of 872) were eliminated from this calculation.Similarly, ˜40% of the total pyrosequences compared in the JPL-SAF floorvs. GSE sample set were eliminated, corresponding to a 77.5% loss oftotal MOTUs (445 out of 574).

Closer observation of sequences from JPL-SAF cleanroom floor and GSEsamples indicates that they predominantly belong to hardier organismssuch as Acidobacteria sp., Actinobacteria sp., Arsenicicoccus sp.,Arthrobacter sp., Corynebacterium sp., Kineococcus sp.,Propionibacterium sp., Nocardioides sp., Streptomyces sp., Bacillus sp.,Clostridium sp., Lactobacillus sp., Deinococcus sp., Staphylococcus sp.

When treated with PMA, marked reduction in total number of sequences (89and 1) and OTUs (17 and 1) was observed for JPL-SAF and 144 cleanroomfloor samples respectively. The PMA effect was significantly higher innon-mission critical floor samples (447 to 1 MOTU) indicating largenumber of dead cells or extraneous DNA present on these floors. Thisviable population belongs to taxa Firmicutes, Actinobacteria andProteobacteria. Similar trend was observed in PMA treated GSE samples (4and 42 MOTUs) and (14 and 528 sequences) from JPL-SAF and 144respectively. The hardy organisms observed in cleanroom floor sampleswithout PMA treatment were absent or in very low number in PMA treatedsamples.

As described above, the system (e.g., PMA-microarrays) and the methodsdisclosed herein can be used to accurately and sensitively detect andquantify viable organisms in a sample.

What is claimed is:
 1. A method for detecting live cells in a samplecomprising: selectively amplifying a nucleic acid from live cells from asample comprising live and dead cells; and detecting the presence,absence, relative abundance, and/or quantity of one or more operationaltaxon units (OTUs) in the sample based on hybridization of amplifiednucleic acid to a plurality of probes complementary to 16s rRNAsequences, wherein said one or more OTUs consist essentially of livecells from said sample.
 2. The method of claim 1, wherein said nucleicacid is DNA.
 3. The method of claim 1, wherein said selectivelyamplifying comprises pre-treating the sample with an agent thatselectively modifies a nucleic acid of dead cells.
 4. The method ofclaim 3, wherein the agent that selectively modifies a nucleic acid ofdead cells is a DNA intercalating agent.
 5. The method of claim 4,wherein said DNA intercalating agent is propidium monoazide.
 6. Themethod of claim 1, wherein said probes are used to detect the presence,absence, relative abundance, and/or quantity of at least 10,000different OTUs in a single assay.
 7. The method of claim 1, wherein saidpresence, absence, relative abundance, and/or quantity is detected witha confidence level greater than 95%.
 8. The method of claim 1, furthercomprising quantifying the number of live cells in the sample.
 9. Amethod for detecting live cells in a sample comprising: selectivelyamplifying a nucleic acid from live cells from a sample comprising liveand dead cells; and determining the presence, absence, relativeabundance, and/or quantity of at least 1,000 different OTUs in a singleassay, wherein said OTUs consist essentially of live cells from saidsample.
 10. A method for detecting live cells in a sample comprising:(a) selectively amplifying a nucleic acid from live cells from a samplecomprising live and dead cells; (b) hybridizing amplified nucleic acidto a plurality of probes; (c) determining hybridization signal strengthdistributions for a plurality of different interrogation probes, each ofwhich is complementary to a section within one or more highly conservedpolynucleotides in one or more target OTUs; (d) determininghybridization signal strength distributions for a plurality of mismatchprobes, wherein for each interrogation probe, one or more differentcorresponding mismatch probes comprising one or more nucleotidemismatches with said section within said one or more highly conservedpolynucleotides are included in the plurality of mismatch probes; and(e) using the hybridization signal strengths of the interrogation probesand mismatch probes to determine the probability that the hybridizationsignal for the different interrogation probes represents the presence,absence, relative abundance, and/or quantity of said one or more OTUs,wherein said one or more OTUs consist essentially of live cells fromsaid sample.
 11. The method of claim 9 or 10, wherein said selectivelyamplifying comprises pre-treating the sample with an agent thatselectively modifies a nucleic acid of dead cells.
 12. The method ofclaim 11, wherein the agent that selectively modifies a nucleic acid ofdead cells is a DNA intercalating agent.
 13. The method of claim 12,wherein said DNA intercalating agent is propidium monoazide.
 14. Themethod of claim 9 or 10, wherein contacting said sample with an agentthat selectively modifies a nucleic acid of dead cells in said sample isfollowed by exposure to visible light.
 15. The method of claim 10,wherein said highly conserved polynucleotides are selected from thegroup consisting of 16S rRNA gene, 23S rRNA gene, 5S rRNA gene, 5.8SrRNA gene, 12S rRNA gene, 18S rRNA gene, 28S rRNA gene, gyrB gene, rpoBgene, fusA gene, recA gene, coxl gene, nif13 gene, RNA molecules derivedtherefrom, and a combination thereof.
 16. The method of claim 10,wherein each interrogation probe has 4 or more corresponding mismatchprobes in the plurality of mismatch probes.
 17. The method of claim 10,wherein said probes are used to detect the presence, absence, relativeabundance, and/or quantity of at least 10,000 different OTUs in a singleassay.
 18. The method of claim 10, wherein said probes are attached to asubstrate.
 19. The method of claim 18, wherein said substrate comprisesa bead or a microsphere.
 20. The method of claim 18, wherein saidsubstrate comprises glass, plastic, or silicon
 21. The method of claim 9or 10, wherein said presence, absence, relative abundance, and/orquantity is detected with a confidence level greater than 95%.
 22. Themethod of claim 9, further comprising quantifying the number of livecells in the sample.
 23. A kit for detecting live cells in a samplecomprising: (a) an agent that selectively modifies a nucleic acid ofdead cells; (b) a plurality of different interrogation probes, each ofwhich is complementary to a section within one or more highly conservedpolynucleotides in one or more target operational taxon units (OTUs);and, (c) a plurality of mismatch probes, wherein for each interrogationprobe, one or more different corresponding mismatch probes comprisingone or more nucleotide mismatches with said section within said one ormore highly conserved polynucleotides are included in the plurality ofmismatch probes.
 24. The kit of claim 23, wherein the agent thatselectively modifies a nucleic acid of dead cells is a DNA intercalatingagent.
 25. The kit of claim 23, wherein said DNA intercalating agent ispropidium monoazide.
 26. The kit of claim 23, wherein each interrogationprobe has 4 or more corresponding mismatch probes in the plurality ofmismatch probes.
 27. The kit of claim 23, wherein said highly conservedpolynucleotides are selected from the group consisting of 16S rRNA gene,23S rRNA gene, 5S rRNA gene, 5.8S rRNA gene, 12S rRNA gene, 18S rRNAgene, 28S rRNA gene, gyrB gene, rpoB gene, fusA gene, recA gene, coxlgene, nif13 gene, RNA molecules derived therefrom, and a combinationthereof.